Are non-semantic morphological effects incompatible with a ...

41
Are non-semantic morphological effects incompatible with a distributed connectionist approach to lexical processing? David C. Plaut Departments of Psychology and Computer Science and the Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, USA Laura M. Gonnerman Department of Psychology and the Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, USA On a distributed connectionist approach, morphology re ects a learned sensitivity to the systematic relationships among the surface forms of words and their meanings. Performance on lexical tasks should thus exhibit graded effects of both semantic and formal similarity. Although there is evidence for such effects, there are also demonstrations of morphological effects in the absence of semantic similarity (when formal similarity is controlled) in morphologically rich languages like Hebrew. Such ndings are typically interpreted as being problematic for the connectionist account. To evaluate whether this interpretation is valid, we carried out simulations in which a set of morphologically related words varying in semantic transparency were embedded in either a morphologically rich or impoverished arti cial language. We found that morphological priming increased with degree of semantic transparency in both languages. Critically, priming extended to semantically opaque items in the morphologically rich language (consistent with ndings in Hebrew) but not in the impoverished language (consistent Requests for reprints should be addressed either to David Plaut ([email protected]) or to Laura Gonnerman ([email protected]), Mellon Institute 115–CNBC, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh PA 15213–2683. The research was supported by an NIMH FIRST award (MH55628) to the rst author, by an NIMH NRSA Grant (DC00374) to the second author, and by NIMH Program Project Grant MH47566 (J. McClelland, PD). The computational simulations were run using customised software written within the Xerion simulator (version 3.1) developed by Drew van Camp, Tony Plate, and Geoff Hinton at the University of Toronto. We thank Marlene Behrmann, Steve Joordens, Jay McClelland, Brian MacWhinney, Jay Rueckl, and the CMU PDP research group for helpful comments and discussion. c 2000 Psychology Press Ltd http://www.tandf.co.uk/journals/pp/01690965.html LANGUAGE AND COGNITIVE PROCESSES, 2000, 15 (4/5), 445–485

Transcript of Are non-semantic morphological effects incompatible with a ...

Page 1: Are non-semantic morphological effects incompatible with a ...

Are non-semantic morphological effectsincompatible with a distributed connectionist

approach to lexical processing

David C PlautDepartments of Psychology and Computer Science and the Center for theNeural Basis of Cognition Carnegie Mellon University Pittsburgh USA

Laura M GonnermanDepartment of Psychology and the Center for the Neural Basis of

Cognition Carnegie Mellon University Pittsburgh USA

On a distributed connectionist approach morphology reects a learnedsensitivity to the systematic relationships among the surface forms of wordsand their meanings Performance on lexical tasks should thus exhibit gradedeffects of both semantic and formal similarity Although there is evidence forsuch effects there are also demonstrations of morphological effects in theabsence of semantic similarity (when formal similarity is controlled) inmorphologically rich languages like Hebrew Such ndings are typicallyinterpreted as being problematic for the connectionist account To evaluatewhether this interpretation is valid we carried out simulations in which a setof morphologically related words varying in semantic transparency wereembedded in either a morphologically rich or impoverished articiallanguage We found that morphological priming increased with degree ofsemantic transparency in both languages Critically priming extended tosemantically opaque items in the morphologically rich language (consistentwith ndings in Hebrew) but not in the impoverished language (consistent

Requests for reprints should be addressed either to David Plaut (plautcmuedu) or toLaura Gonnerman (lauragcnbccmuedu) Mellon Institute 115ndashCNBC Carnegie MellonUniversity 4400 Fifth Avenue Pittsburgh PA 15213ndash2683

The research was supported by an NIMH FIRST award (MH55628) to the rst author byan NIMH NRSA Grant (DC00374) to the second author and by NIMH Program ProjectGrant MH47566 (J McClelland PD) The computational simulations were run usingcustomised software written within the Xerion simulator (version 31) developed by Drew vanCamp Tony Plate and Geoff Hinton at the University of Toronto We thank MarleneBehrmann Steve Joordens Jay McClelland Brian MacWhinney Jay Rueckl and the CMUPDP research group for helpful comments and discussion

c 2000 Psychology Press Ltdhttpwwwtandfcoukjournalspp01690965html

LANGUAGE AND COGNITIVE PROCESSES 2000 15 (45) 445ndash485

446 PLAUT AND GONNERMAN

with ndings in English) Such priming arises because the processing of allitems including opaque forms is inuenced by the degree of morphologicalorganisation of the entire system These ndings suggest that rather thanbeing challenged by the occurrence of non-semantic morphological effects inmorphologically rich languages the connectionist approach may provide anexplanation for the cross-linguistic differences in the occurrence of theseeffects

INTRODUCTION

Language is among the most fundamental of human cognitive abilities andone of its most remarkable properties is its productivity a relatively smallnumber of familiar elements can be recombined in highly exible ways toexpress a virtually limitless range of ideas This productivity stems bothfrom morphology (ie how words are formed) and from syntax (ie howthey are ordered into sentences) Languages differ widely in the extent towhich they rely on morphology versus syntax for productivity and howmorphology is expressed in the written and spoken forms of words Indeedit is generally agreed that morphology is the area of language that exhibitsthe greatest degree of cross-linguistic variation and consequently thatlearning must play a particularly important role in its organisation

In these respects English and Hebrew constitutes a very informativecontrast English makes relatively limited use of morphology and theprocess of word formation mostly involves simple concatenation (egUNBREAKABLE = UN- + BREAK + -ABLE see Marchand 1969) In Hebrew bycontrast almost every word is structurally complex and word formationinvolves both concatenative and non-concatenative processes includinginterdigitating a largely vowel-based word pattern into a three-consonantroot (eg ZMR [related to music] + _ I _ _ A = lsquolsquozimrarsquorsquo [singing] seeBerman 1978) Thus it is a considerable challenge to theories of languageto explain how a child can learn either English or Hebrew (or both)depending only on the language environment in which he or she grows up

The standard view of morphology is that words are built out of discreteunits called morphemes that contribute systematically to the meanings ofwords containing them (eg RE- typically means to do again as in REW ASHRESEAL etc) Morphological systematicity is important in part because itunderlies the formation of novel words For example given the relativelynew English verb FAX we can automatically say and understand words likeREFAXED and UNFAXABLE In fact the same holds for novel verbs regardlessof what the verb W UG might mean something is UNW UGGABLE if it cannotbe W UGGED

Critically as in other linguistic domains systematicity in morphology isonly partial For example although most English verbs form the past tenseby adding -ED there are also about 150 irregular verbs that form clusters

A CONNECTIONIST APPROACH TO MORPHOLOGY 447

which undergo similar changes (eg SIN G ) lsquolsquosangrsquorsquo DRIN K ) lsquolsquodrankrsquorsquo)along with a few very high-frequency arbitrary forms (eg GO ) lsquolsquowentrsquorsquo)A similar state of affairs holds for other aspects of morphology (seeMarchand 1969) For instance given an English verb the most productiveway of deriving the word for one who does the action is to add -ER thussomeone who RUN S is a RUNN ER someone who THINKS is a THINKER and soon Such words are transparent because their meaning can be deriveddirectly from the meanings of their morphemes However there are alsoopaque items someone who ndash IES a plane is not a ndash IER but a PILOT aCORN ER is not someone who CORNS

Many theories of morphological processing (eg Burani amp Caramazza1987 Marslen-Wilson Tyler Waksler amp Older 1994 Pinker 1991Prasada amp Pinker 1993 Schreuder amp Baayen 1995) respond to the factthat morphology is only partially systematic by handling the systematic andidiosyncratic aspects separately Idiosyncratic words are stored asunanalysed wholes whereas systematic words are represented andprocessed compositionally in terms of stems and afxes Recognition ofthe latter thus involves decomposing a complex word into its identiablestem and afx and then accessing the stored stem (Marslen-Wilson et al1994) Such dual-mechanism theories run into problems however becausemorphological structure is not just partial but gradedmdashthat is to say thereare many intermediate cases that are neither completely transparent(regular) nor completely opaque (irregular) Rather there are sub-regularities among the irregulars and between them and regulars Forexample the spoken forms of irregular verbs that do not change betweenpresent and past tense (eg HIT SPREAD) all end in either t or dmdashthesame phonemes that occur at the end of regular past-tense forms Thusthese items are not arbitrary exceptions the fact that they already sharethe phonological structure of a past-tense form contributes to theirpatterning Similarly a word like DRESSER is neither fully semanticallytransparent nor fully opaquemdashalthough a DRESSER is not someone whodresses but a piece of furniture that holds clothes it is clearly related to theactivity of dressing More generally the isolation of idiosyncratic fromsystematic knowledge which is at the core of dual-mechanism theories andwhich seems necessary to prevent the exceptions from compromisinggeneralisation to novel forms ends up being a serious handicap in the faceof the rich graded structure of language

These difculties have prompted many researchers to explore analternative way of expressing language knowledge based on connectionistor neural network modelling that can handle graded degrees ofsystematicity more naturally Connectionist models attempt to capturethe essential properties of the neural mechanisms that give rise tobehaviour by implementing cognitive processes in terms of cooperative

448 PLAUT AND GONNERMAN

and competitive interactions among large groups of simple neuron-likeunits (see McClelland Rumelhart amp PDP Research Group 1986McLeod Plunkett amp Rolls 1998 Quinlan 1991 Rumelhart McClellandamp PDP Research Group 1986b)

From a connectionist perspective morphology is a characterisation ofthe learned mapping between the surface forms of words (orthographyphonology) and their meanings (semantics) (see also Cottrell amp Plunkett1995 Gonnerman Devlin Andersen amp Seidenberg 2000b Hoeffner ampMcClelland 1993 Joanisse amp Seidenberg 1999 Rueckl MikolinskiRaveh Miner amp Mars 1997 Rueckl amp Raveh 1999) To the extent thata particular surface pattern occurs in many words and maps consistently tocertain aspects of meaning the internal representations will come to reectthis structure and treat the pattern componentiallymdashthat is represent andprocess it relatively independently of the other parts of the word (Plaut ampMcClelland 1993 Plaut McClelland Seidenberg amp Patterson 1996) Assurface patterns behave less systematically their internal representationswill be correspondingly less componential At the extreme end if an itemviolates what is otherwise a systematic aspect of the domain (eg CORN ER)its internal representation must be very non-componential in order toavoid being given a transparent interpretation (eg someone who CORN S)

To make the connectionist approach to morphology more concreteconsider the framework for lexical processing depicted in Figure 1 (seealso Gonnerman et al 2000b Joanisse amp Seidenberg 1999) Theorthography phonology and semantics of words are represented as

Figure 1 A connectionist framework for lexical processing The large arrows depict inputsand outputs of the system

A CONNECTIONIST APPROACH TO MORPHOLOGY 449

distributed patterns of activity over separate groups of processing unitsWithin each of these domains similar words are represented by similar(overlapping) patterns of activity For example CAT and DOG would beencoded by similar patterns in semantics but by dissimilar patterns inorthography and phonology whereas CAT and CAP would be similar inorthography and phonology but dissimilar in semantics Lexical tasksinvolve transformations between these patternsmdashfor example writtenword comprehension requires that the orthographic pattern for a wordgenerate the appropriate semantic pattern Such transformations areaccomplished via interactions among the units including additionalintermediate or hidden units that mediate between the other groups ofunits1 Unit interactions are governed by weighted excitatory andinhibitory connections between them When the network is presentedwith input (eg the orthography of CAT) units repeatedly update theiractivations as a function of the total input they receive via connectionsfrom other units until the network as a whole settles into a stable patternof activity constituting its response to the input (eg the meaning andpronunciation of CAT) The behaviour of the network in response to a giveninput thus depends on the specic values of the connection weights whichcollectively encode the knowledge of the system The network learns thesevalues by gradually adjusting the weights to improve performance basedon exposure to written words spoken words and their meanings Thislearning has the effect of inducing new internal representations over theintermediate units that are effective in accomplishing the relevant tasks

At a general level the reliance within the connectionist approach onlearning internal representations is in sharp contrast to more traditionalapproaches to language which typically make very strong assumptionsabout the structure of internal representations and the processes thatmanipulate them For example it is often assumed that underlyinglinguistic knowledge takes the form of explicit rules which operate overdiscrete symbolic representations (Chomsky 1957 Chomsky amp Halle1968 Fodor amp Pylyshyn 1988 Pinker 1991 Prasada amp Pinker 1993) andmoreover that this knowledge is in large part innately specied(Chomsky 1965 Crain 1991 Pinker 1994) By contrast rather thanstipulating the specic form and content of the knowledge required forperformance in a domain the connectionist approach instead species the

1 Readers may note that the current framework differs from the lsquolsquotrianglersquorsquo model of wordreading (eg Harm amp Seidenberg 1999 Plaut et al 1996 Seidenberg amp McClelland 1989) inhaving one rather than three groups of hidden units mediating among orthographyphonology and semantics Although this distinction is a matter of ongoing research Kelloand Plaut (in press) report strategic effects in word reading that seem problematic for thetriangle framework but can be simulated effectively by a model with a common group ofhidden units (Kello amp Plaut 1998 in preparation)

450 PLAUT AND GONNERMAN

tasks the system must perform but then leaves it up to the learning processto develop the necessary internal representations and processes (McClel-land St John amp Taraban 1989) The result is often that learning developsrepresentations and processes which are radically different from thoseproposed by traditional theories resulting in novel hypotheses and testablepredictions concerning human cognitive behaviour

In particular the connectionist approach to morphology makes thestrong prediction that the magnitudes of behavioural effects that reectmorphological processing should vary continuously as a function of thedegree of semantic and phonological transparency of words Thisprediction was recently conrmed in English by Gonnerman Andersenand Seidenberg (2000a) who found that prime-target pairs withintermediate degrees of semantic or phonological transparency producedintermediate degrees of priming

Other empirical ndings however would seem to challenge theconnectionist claim that morphology is derived from learning amongorthography phonology and semantics In particular it seems difcult toreconcile this perspective with demonstrations of morphological effectsthat are independent of both semantic and surface similarity Many ofthese demonstrations are in languages such as Hebrew that have muchricher morphological systems than English For example morphologicalpriming in Hebrew can occur in the absence of semantic priming or can beinsensitive to variations in semantic relatedness when surface similarity iscontrolled (Bentin amp Feldman 1990 Frost Deutsch amp Forster in press-aFrost Deutsch Gilboa Tannenbaum amp Marslen-Wilson in press-b FrostForster amp Deutsch 1997) It thus appears that the existence of primingunder these conditions is governed by morphological relatedness per seand not by semantic or surface similarity If according to the connectionistapproach morphological structure ultimately derives from gradedsystematicity among surface forms of words and their meanings therewould seem to be no basis for morphological effects that are independentof semantic and surface similarity

The current article reports on two simulations designed to evaluatewhether the existence of morphological priming in the absence of semanticsimilarity in morphologically rich languages is inconsistent with adistributed connectionist approach to morphology A common set ofprimes and targets ranging in transparency were embedded among alarger set of words that were either transparent or opaque forming twoarticial languages a morphologically rich language analogous to Hebrewand a morphologically impoverished language analogous to English Anetwork was then trained to map orthography to semantics for eitherlanguage and then tested for priming Both languages yielded morpho-logical priming for the more transparent targets but priming extended to

A CONNECTIONIST APPROACH TO MORPHOLOGY 451

semantically opaque targets only in the rich language The results suggestthat non-semantic morphological priming is in fact compatible with theconnectionist approach but that its occurrence is predicted to depend onthe degree of morphological richness of the language

The next section reviews the relevant empirical ndings that support orchallenge a connectionist approach to morphology We then discuss thecomputational principles that are essential for understanding both thegeneral approach and the basis for the results of the simulations whichfollow We conclude with a more general discussion of the strengths andlimitations of the current work as well as remaining challenges to theconnectionist approach

OVERVIEW OF RELEVANT EMPIRICAL FINDINGS

One of the earliest studies on the role of morphology in lexical processingwas conducted by Taft and Forster (1975) Using a lexical decision taskthey found that subjects were able to reject nonwords with pseudo-stems(eg DEPERTOIRE) faster than nonwords with real stems (eg DEJUVEN ATE)Based on these ndings Taft and Forster argued for a processing strategythat rst strips afxes from complex words and then searches for a storedstem representation Many subsequent studies using a variety ofexperimental techniques have examined the extent to which morpholo-gical information is used to decompose complex words in lexicalprocessing (see Feldman 1995) These studies have produced conictingresults leading to disagreement about the precise nature and locus ofmorphological decomposition For example Andrews (1986) argued thatcompound words are decomposed in lexical access sufxed words areoptionally decomposed in access and all types of complex words arerepresented as whole forms Alternatively Stanners Neiser Hernon andHall (1979) claimed that derived forms are processed as wholes butinected forms are decomposed Another account proposes that prexedwords are processed as whole forms but that sufxed words aredecomposed (Cole Beauvillain amp Segui 1989) Yet another approachadvocates two types of lexical access proceduresmdashwhole word anddecompositionalmdashwhich are activated in parallel (eg Burani Salmasoamp Caramazza 1984 Laudanna Badecker amp Caramazza 1989 1992Laudanna Cermele amp Caramazza 1997) While there has been noconsensus as to exactly when and where morphological processing occursthe majority of researchers agree that some form of discrete all-or-nonedecomposition based on morphological structure operates somewherewithin the lexical system

452 PLAUT AND GONNERMAN

The connectionist approach by contrast holds that decomposition is notan all-or-none phenomenon and that behavioural effects should be gradedreecting the degree of convergence among semantic phonological andorthographic codes One source of evidence for continuous gradedsimilarity structure in the semantic relatedness of morphologically complexwords and their stems comes from subjectsrsquo similarity judgements(Gonnermann 1999 Gonnerman et al 2000a) Gonnerman andcolleagues found that subjects produced highly consistent ratings that fellon a continuum judging some word pairs to be unrelated (eg CORNERndashCORN) some moderately related (eg DRESSERndashDRESS) and some highlyrelated (eg TEACHERndashTEACH) With regard to formal similarity Rueckl etal (1997) present ndings from both masked and standard fragmentcompletion tasks in which priming effects for morphologically related wordpairs varied in magnitude as a function of orthographic similarity

Further empirical support for the connectionist perspective comes froma series of cross-modal priming studies in which subjects made lexicaldecisions to written targets preceded by auditorily presented primes(Gonnerman 1999 Gonnerman et al 2000a see also Marslen-Wilson etal 1994) Gonnerman and colleagues compared the magnitude ofpriming as a function of the degree of semantic transparency betweenprime and target (eg high BREAKAGEndashBREAK intermediate SHORTAGEndashSHORT low MESSAGEndashMESS) Crucially all of the items were phonologi-cally transparent such that the stems were contained unchanged in thecomplex words Gonnerman and colleagues found that prime-target pairswith intermediate degrees of transparency also produced intermediatedegrees of priming Thus highly semantically-related word pairs (egBAKERndashBAKE) primed twice as much (40 vs 19 ms) as moderately relatedpairs (eg DRESSERndashDRESS) and unrelated pairs (eg CORNERndashCORN)showed no priming effects Strikingly similar results were also obtainedfor pairs of prexed words and their stems highly related pairs (egPREHEATndashHEAT) primed twice as much (42 vs 20 ms) as moderatelyrelated pairs (eg MIDSTREAMndashSTREAM) whereas unrelated pairs (egREHEARSEndashHEARSE) showed no priming Again as with the sufxed-stempairs phonological relatedness was held constant A third experimentshowed that these graded effects hold for derived-derived word pairsHighly related pairs (eg SAINTLYndashSAINTHOOD) produced signicantfacilitation (34 ms) whereas less related pairs (eg OBSERVATIONndashOBSERVANT) produced only moderate facilitatory effects (14 ms) thatfailed to reach signicance

Gonnerman et al (2000a) also found evidence for graded effects ofphonological relatedness among word pairs that were equated on measuresof semantic similarity Thus priming was greater among phonologicallyrelated pairs involving a consonant change (eg DELETIONndashDELETE 65 ms)

A CONNECTIONIST APPROACH TO MORPHOLOGY 453

than among pairs involving a vowel change (eg VAN ITYndashVAIN 48 ms)which in turn was greater than priming among pairs involving both voweland consonant changes (eg INTRODUCTIONndashINTRODUCE 35 ms)

In two nal experiments Gonnerman et al (2000a) found similarpriming effects among word pairs that are both semantically andphonologically similar but morphologically unrelated In one experimentsignicant priming (24 ms) was demonstrated for lsquopseudo-sufxedrsquo wordsthat do not share morphological stems (eg TRIVIALndashTRIndash E) but cruciallyonly if the words overlapped sufciently in both meaning and sound Datafrom a second experiment indicated priming (35 ms) for words taken fromsemantic and phonological clusters where there are no morphologicalrelationships (eg SNARLndashSN EER) Again priming occurred only if therewas sufcient overlap in both meaning and sound SW ITCHndashSW IM pairs didnot prime These ndings conrm that priming effects similar to thosefound for morphologically related items hold for word pairs that arecorrelated in meaning and sound but have no morphological relationships(see also Rueckl amp Dror 1994 for a related demonstration that subjectslearn an artical vocabulary better if formal and semantic similarity arecorrelated)

Thus Gonnerman et al (2000a) provided clear evidence for gradedeffects of similarity in semantics phonology and orthography on cross-modal priming Furthermore the graded semantic effects were shown togeneralise across morphological types holding for morphologically relatedsufxed-stem prexed-stem and sufxed-sufxed pairs The effects ofgraded semantic and phonological similarity were demonstrated even formorphologically unrelated word pairs Taken together these resultssupport a distributed connectionist approach where morphology char-acterises the regularities in the learned mapping between semantics andphonology

There are of course ndings that seem to challenge the notion thatmorphological effects can be reduced entirely to the joint inuence ofsemantic and formal similarity For example with regard to formalsimilarity Murrell and Morton (1974) found priming among formallyrelated word pairs only if they were morphologically related (eg CARSndashCAR but not CARDndashCAR) and concluded that priming is determined bymorphological structure rather than by formal relatedness per se Withregard to semantics Kempley and Morton (1982) found that regularlyinected word pairs (eg REndash ECTEDndashREndash ECTING) produced signicantfacilitation whereas irregularly inected pairs (eg HELDndashHOLDING) didnot They argued on the basis of these results that semantic properties ofwords do not inuence processing whereas morphology does Manyadditional studies have come to a similar conclusion that morphologicalstructure and not semantic or formal properties underlies lexical

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 2: Are non-semantic morphological effects incompatible with a ...

446 PLAUT AND GONNERMAN

with ndings in English) Such priming arises because the processing of allitems including opaque forms is inuenced by the degree of morphologicalorganisation of the entire system These ndings suggest that rather thanbeing challenged by the occurrence of non-semantic morphological effects inmorphologically rich languages the connectionist approach may provide anexplanation for the cross-linguistic differences in the occurrence of theseeffects

INTRODUCTION

Language is among the most fundamental of human cognitive abilities andone of its most remarkable properties is its productivity a relatively smallnumber of familiar elements can be recombined in highly exible ways toexpress a virtually limitless range of ideas This productivity stems bothfrom morphology (ie how words are formed) and from syntax (ie howthey are ordered into sentences) Languages differ widely in the extent towhich they rely on morphology versus syntax for productivity and howmorphology is expressed in the written and spoken forms of words Indeedit is generally agreed that morphology is the area of language that exhibitsthe greatest degree of cross-linguistic variation and consequently thatlearning must play a particularly important role in its organisation

In these respects English and Hebrew constitutes a very informativecontrast English makes relatively limited use of morphology and theprocess of word formation mostly involves simple concatenation (egUNBREAKABLE = UN- + BREAK + -ABLE see Marchand 1969) In Hebrew bycontrast almost every word is structurally complex and word formationinvolves both concatenative and non-concatenative processes includinginterdigitating a largely vowel-based word pattern into a three-consonantroot (eg ZMR [related to music] + _ I _ _ A = lsquolsquozimrarsquorsquo [singing] seeBerman 1978) Thus it is a considerable challenge to theories of languageto explain how a child can learn either English or Hebrew (or both)depending only on the language environment in which he or she grows up

The standard view of morphology is that words are built out of discreteunits called morphemes that contribute systematically to the meanings ofwords containing them (eg RE- typically means to do again as in REW ASHRESEAL etc) Morphological systematicity is important in part because itunderlies the formation of novel words For example given the relativelynew English verb FAX we can automatically say and understand words likeREFAXED and UNFAXABLE In fact the same holds for novel verbs regardlessof what the verb W UG might mean something is UNW UGGABLE if it cannotbe W UGGED

Critically as in other linguistic domains systematicity in morphology isonly partial For example although most English verbs form the past tenseby adding -ED there are also about 150 irregular verbs that form clusters

A CONNECTIONIST APPROACH TO MORPHOLOGY 447

which undergo similar changes (eg SIN G ) lsquolsquosangrsquorsquo DRIN K ) lsquolsquodrankrsquorsquo)along with a few very high-frequency arbitrary forms (eg GO ) lsquolsquowentrsquorsquo)A similar state of affairs holds for other aspects of morphology (seeMarchand 1969) For instance given an English verb the most productiveway of deriving the word for one who does the action is to add -ER thussomeone who RUN S is a RUNN ER someone who THINKS is a THINKER and soon Such words are transparent because their meaning can be deriveddirectly from the meanings of their morphemes However there are alsoopaque items someone who ndash IES a plane is not a ndash IER but a PILOT aCORN ER is not someone who CORNS

Many theories of morphological processing (eg Burani amp Caramazza1987 Marslen-Wilson Tyler Waksler amp Older 1994 Pinker 1991Prasada amp Pinker 1993 Schreuder amp Baayen 1995) respond to the factthat morphology is only partially systematic by handling the systematic andidiosyncratic aspects separately Idiosyncratic words are stored asunanalysed wholes whereas systematic words are represented andprocessed compositionally in terms of stems and afxes Recognition ofthe latter thus involves decomposing a complex word into its identiablestem and afx and then accessing the stored stem (Marslen-Wilson et al1994) Such dual-mechanism theories run into problems however becausemorphological structure is not just partial but gradedmdashthat is to say thereare many intermediate cases that are neither completely transparent(regular) nor completely opaque (irregular) Rather there are sub-regularities among the irregulars and between them and regulars Forexample the spoken forms of irregular verbs that do not change betweenpresent and past tense (eg HIT SPREAD) all end in either t or dmdashthesame phonemes that occur at the end of regular past-tense forms Thusthese items are not arbitrary exceptions the fact that they already sharethe phonological structure of a past-tense form contributes to theirpatterning Similarly a word like DRESSER is neither fully semanticallytransparent nor fully opaquemdashalthough a DRESSER is not someone whodresses but a piece of furniture that holds clothes it is clearly related to theactivity of dressing More generally the isolation of idiosyncratic fromsystematic knowledge which is at the core of dual-mechanism theories andwhich seems necessary to prevent the exceptions from compromisinggeneralisation to novel forms ends up being a serious handicap in the faceof the rich graded structure of language

These difculties have prompted many researchers to explore analternative way of expressing language knowledge based on connectionistor neural network modelling that can handle graded degrees ofsystematicity more naturally Connectionist models attempt to capturethe essential properties of the neural mechanisms that give rise tobehaviour by implementing cognitive processes in terms of cooperative

448 PLAUT AND GONNERMAN

and competitive interactions among large groups of simple neuron-likeunits (see McClelland Rumelhart amp PDP Research Group 1986McLeod Plunkett amp Rolls 1998 Quinlan 1991 Rumelhart McClellandamp PDP Research Group 1986b)

From a connectionist perspective morphology is a characterisation ofthe learned mapping between the surface forms of words (orthographyphonology) and their meanings (semantics) (see also Cottrell amp Plunkett1995 Gonnerman Devlin Andersen amp Seidenberg 2000b Hoeffner ampMcClelland 1993 Joanisse amp Seidenberg 1999 Rueckl MikolinskiRaveh Miner amp Mars 1997 Rueckl amp Raveh 1999) To the extent thata particular surface pattern occurs in many words and maps consistently tocertain aspects of meaning the internal representations will come to reectthis structure and treat the pattern componentiallymdashthat is represent andprocess it relatively independently of the other parts of the word (Plaut ampMcClelland 1993 Plaut McClelland Seidenberg amp Patterson 1996) Assurface patterns behave less systematically their internal representationswill be correspondingly less componential At the extreme end if an itemviolates what is otherwise a systematic aspect of the domain (eg CORN ER)its internal representation must be very non-componential in order toavoid being given a transparent interpretation (eg someone who CORN S)

To make the connectionist approach to morphology more concreteconsider the framework for lexical processing depicted in Figure 1 (seealso Gonnerman et al 2000b Joanisse amp Seidenberg 1999) Theorthography phonology and semantics of words are represented as

Figure 1 A connectionist framework for lexical processing The large arrows depict inputsand outputs of the system

A CONNECTIONIST APPROACH TO MORPHOLOGY 449

distributed patterns of activity over separate groups of processing unitsWithin each of these domains similar words are represented by similar(overlapping) patterns of activity For example CAT and DOG would beencoded by similar patterns in semantics but by dissimilar patterns inorthography and phonology whereas CAT and CAP would be similar inorthography and phonology but dissimilar in semantics Lexical tasksinvolve transformations between these patternsmdashfor example writtenword comprehension requires that the orthographic pattern for a wordgenerate the appropriate semantic pattern Such transformations areaccomplished via interactions among the units including additionalintermediate or hidden units that mediate between the other groups ofunits1 Unit interactions are governed by weighted excitatory andinhibitory connections between them When the network is presentedwith input (eg the orthography of CAT) units repeatedly update theiractivations as a function of the total input they receive via connectionsfrom other units until the network as a whole settles into a stable patternof activity constituting its response to the input (eg the meaning andpronunciation of CAT) The behaviour of the network in response to a giveninput thus depends on the specic values of the connection weights whichcollectively encode the knowledge of the system The network learns thesevalues by gradually adjusting the weights to improve performance basedon exposure to written words spoken words and their meanings Thislearning has the effect of inducing new internal representations over theintermediate units that are effective in accomplishing the relevant tasks

At a general level the reliance within the connectionist approach onlearning internal representations is in sharp contrast to more traditionalapproaches to language which typically make very strong assumptionsabout the structure of internal representations and the processes thatmanipulate them For example it is often assumed that underlyinglinguistic knowledge takes the form of explicit rules which operate overdiscrete symbolic representations (Chomsky 1957 Chomsky amp Halle1968 Fodor amp Pylyshyn 1988 Pinker 1991 Prasada amp Pinker 1993) andmoreover that this knowledge is in large part innately specied(Chomsky 1965 Crain 1991 Pinker 1994) By contrast rather thanstipulating the specic form and content of the knowledge required forperformance in a domain the connectionist approach instead species the

1 Readers may note that the current framework differs from the lsquolsquotrianglersquorsquo model of wordreading (eg Harm amp Seidenberg 1999 Plaut et al 1996 Seidenberg amp McClelland 1989) inhaving one rather than three groups of hidden units mediating among orthographyphonology and semantics Although this distinction is a matter of ongoing research Kelloand Plaut (in press) report strategic effects in word reading that seem problematic for thetriangle framework but can be simulated effectively by a model with a common group ofhidden units (Kello amp Plaut 1998 in preparation)

450 PLAUT AND GONNERMAN

tasks the system must perform but then leaves it up to the learning processto develop the necessary internal representations and processes (McClel-land St John amp Taraban 1989) The result is often that learning developsrepresentations and processes which are radically different from thoseproposed by traditional theories resulting in novel hypotheses and testablepredictions concerning human cognitive behaviour

In particular the connectionist approach to morphology makes thestrong prediction that the magnitudes of behavioural effects that reectmorphological processing should vary continuously as a function of thedegree of semantic and phonological transparency of words Thisprediction was recently conrmed in English by Gonnerman Andersenand Seidenberg (2000a) who found that prime-target pairs withintermediate degrees of semantic or phonological transparency producedintermediate degrees of priming

Other empirical ndings however would seem to challenge theconnectionist claim that morphology is derived from learning amongorthography phonology and semantics In particular it seems difcult toreconcile this perspective with demonstrations of morphological effectsthat are independent of both semantic and surface similarity Many ofthese demonstrations are in languages such as Hebrew that have muchricher morphological systems than English For example morphologicalpriming in Hebrew can occur in the absence of semantic priming or can beinsensitive to variations in semantic relatedness when surface similarity iscontrolled (Bentin amp Feldman 1990 Frost Deutsch amp Forster in press-aFrost Deutsch Gilboa Tannenbaum amp Marslen-Wilson in press-b FrostForster amp Deutsch 1997) It thus appears that the existence of primingunder these conditions is governed by morphological relatedness per seand not by semantic or surface similarity If according to the connectionistapproach morphological structure ultimately derives from gradedsystematicity among surface forms of words and their meanings therewould seem to be no basis for morphological effects that are independentof semantic and surface similarity

The current article reports on two simulations designed to evaluatewhether the existence of morphological priming in the absence of semanticsimilarity in morphologically rich languages is inconsistent with adistributed connectionist approach to morphology A common set ofprimes and targets ranging in transparency were embedded among alarger set of words that were either transparent or opaque forming twoarticial languages a morphologically rich language analogous to Hebrewand a morphologically impoverished language analogous to English Anetwork was then trained to map orthography to semantics for eitherlanguage and then tested for priming Both languages yielded morpho-logical priming for the more transparent targets but priming extended to

A CONNECTIONIST APPROACH TO MORPHOLOGY 451

semantically opaque targets only in the rich language The results suggestthat non-semantic morphological priming is in fact compatible with theconnectionist approach but that its occurrence is predicted to depend onthe degree of morphological richness of the language

The next section reviews the relevant empirical ndings that support orchallenge a connectionist approach to morphology We then discuss thecomputational principles that are essential for understanding both thegeneral approach and the basis for the results of the simulations whichfollow We conclude with a more general discussion of the strengths andlimitations of the current work as well as remaining challenges to theconnectionist approach

OVERVIEW OF RELEVANT EMPIRICAL FINDINGS

One of the earliest studies on the role of morphology in lexical processingwas conducted by Taft and Forster (1975) Using a lexical decision taskthey found that subjects were able to reject nonwords with pseudo-stems(eg DEPERTOIRE) faster than nonwords with real stems (eg DEJUVEN ATE)Based on these ndings Taft and Forster argued for a processing strategythat rst strips afxes from complex words and then searches for a storedstem representation Many subsequent studies using a variety ofexperimental techniques have examined the extent to which morpholo-gical information is used to decompose complex words in lexicalprocessing (see Feldman 1995) These studies have produced conictingresults leading to disagreement about the precise nature and locus ofmorphological decomposition For example Andrews (1986) argued thatcompound words are decomposed in lexical access sufxed words areoptionally decomposed in access and all types of complex words arerepresented as whole forms Alternatively Stanners Neiser Hernon andHall (1979) claimed that derived forms are processed as wholes butinected forms are decomposed Another account proposes that prexedwords are processed as whole forms but that sufxed words aredecomposed (Cole Beauvillain amp Segui 1989) Yet another approachadvocates two types of lexical access proceduresmdashwhole word anddecompositionalmdashwhich are activated in parallel (eg Burani Salmasoamp Caramazza 1984 Laudanna Badecker amp Caramazza 1989 1992Laudanna Cermele amp Caramazza 1997) While there has been noconsensus as to exactly when and where morphological processing occursthe majority of researchers agree that some form of discrete all-or-nonedecomposition based on morphological structure operates somewherewithin the lexical system

452 PLAUT AND GONNERMAN

The connectionist approach by contrast holds that decomposition is notan all-or-none phenomenon and that behavioural effects should be gradedreecting the degree of convergence among semantic phonological andorthographic codes One source of evidence for continuous gradedsimilarity structure in the semantic relatedness of morphologically complexwords and their stems comes from subjectsrsquo similarity judgements(Gonnermann 1999 Gonnerman et al 2000a) Gonnerman andcolleagues found that subjects produced highly consistent ratings that fellon a continuum judging some word pairs to be unrelated (eg CORNERndashCORN) some moderately related (eg DRESSERndashDRESS) and some highlyrelated (eg TEACHERndashTEACH) With regard to formal similarity Rueckl etal (1997) present ndings from both masked and standard fragmentcompletion tasks in which priming effects for morphologically related wordpairs varied in magnitude as a function of orthographic similarity

Further empirical support for the connectionist perspective comes froma series of cross-modal priming studies in which subjects made lexicaldecisions to written targets preceded by auditorily presented primes(Gonnerman 1999 Gonnerman et al 2000a see also Marslen-Wilson etal 1994) Gonnerman and colleagues compared the magnitude ofpriming as a function of the degree of semantic transparency betweenprime and target (eg high BREAKAGEndashBREAK intermediate SHORTAGEndashSHORT low MESSAGEndashMESS) Crucially all of the items were phonologi-cally transparent such that the stems were contained unchanged in thecomplex words Gonnerman and colleagues found that prime-target pairswith intermediate degrees of transparency also produced intermediatedegrees of priming Thus highly semantically-related word pairs (egBAKERndashBAKE) primed twice as much (40 vs 19 ms) as moderately relatedpairs (eg DRESSERndashDRESS) and unrelated pairs (eg CORNERndashCORN)showed no priming effects Strikingly similar results were also obtainedfor pairs of prexed words and their stems highly related pairs (egPREHEATndashHEAT) primed twice as much (42 vs 20 ms) as moderatelyrelated pairs (eg MIDSTREAMndashSTREAM) whereas unrelated pairs (egREHEARSEndashHEARSE) showed no priming Again as with the sufxed-stempairs phonological relatedness was held constant A third experimentshowed that these graded effects hold for derived-derived word pairsHighly related pairs (eg SAINTLYndashSAINTHOOD) produced signicantfacilitation (34 ms) whereas less related pairs (eg OBSERVATIONndashOBSERVANT) produced only moderate facilitatory effects (14 ms) thatfailed to reach signicance

Gonnerman et al (2000a) also found evidence for graded effects ofphonological relatedness among word pairs that were equated on measuresof semantic similarity Thus priming was greater among phonologicallyrelated pairs involving a consonant change (eg DELETIONndashDELETE 65 ms)

A CONNECTIONIST APPROACH TO MORPHOLOGY 453

than among pairs involving a vowel change (eg VAN ITYndashVAIN 48 ms)which in turn was greater than priming among pairs involving both voweland consonant changes (eg INTRODUCTIONndashINTRODUCE 35 ms)

In two nal experiments Gonnerman et al (2000a) found similarpriming effects among word pairs that are both semantically andphonologically similar but morphologically unrelated In one experimentsignicant priming (24 ms) was demonstrated for lsquopseudo-sufxedrsquo wordsthat do not share morphological stems (eg TRIVIALndashTRIndash E) but cruciallyonly if the words overlapped sufciently in both meaning and sound Datafrom a second experiment indicated priming (35 ms) for words taken fromsemantic and phonological clusters where there are no morphologicalrelationships (eg SNARLndashSN EER) Again priming occurred only if therewas sufcient overlap in both meaning and sound SW ITCHndashSW IM pairs didnot prime These ndings conrm that priming effects similar to thosefound for morphologically related items hold for word pairs that arecorrelated in meaning and sound but have no morphological relationships(see also Rueckl amp Dror 1994 for a related demonstration that subjectslearn an artical vocabulary better if formal and semantic similarity arecorrelated)

Thus Gonnerman et al (2000a) provided clear evidence for gradedeffects of similarity in semantics phonology and orthography on cross-modal priming Furthermore the graded semantic effects were shown togeneralise across morphological types holding for morphologically relatedsufxed-stem prexed-stem and sufxed-sufxed pairs The effects ofgraded semantic and phonological similarity were demonstrated even formorphologically unrelated word pairs Taken together these resultssupport a distributed connectionist approach where morphology char-acterises the regularities in the learned mapping between semantics andphonology

There are of course ndings that seem to challenge the notion thatmorphological effects can be reduced entirely to the joint inuence ofsemantic and formal similarity For example with regard to formalsimilarity Murrell and Morton (1974) found priming among formallyrelated word pairs only if they were morphologically related (eg CARSndashCAR but not CARDndashCAR) and concluded that priming is determined bymorphological structure rather than by formal relatedness per se Withregard to semantics Kempley and Morton (1982) found that regularlyinected word pairs (eg REndash ECTEDndashREndash ECTING) produced signicantfacilitation whereas irregularly inected pairs (eg HELDndashHOLDING) didnot They argued on the basis of these results that semantic properties ofwords do not inuence processing whereas morphology does Manyadditional studies have come to a similar conclusion that morphologicalstructure and not semantic or formal properties underlies lexical

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 3: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 447

which undergo similar changes (eg SIN G ) lsquolsquosangrsquorsquo DRIN K ) lsquolsquodrankrsquorsquo)along with a few very high-frequency arbitrary forms (eg GO ) lsquolsquowentrsquorsquo)A similar state of affairs holds for other aspects of morphology (seeMarchand 1969) For instance given an English verb the most productiveway of deriving the word for one who does the action is to add -ER thussomeone who RUN S is a RUNN ER someone who THINKS is a THINKER and soon Such words are transparent because their meaning can be deriveddirectly from the meanings of their morphemes However there are alsoopaque items someone who ndash IES a plane is not a ndash IER but a PILOT aCORN ER is not someone who CORNS

Many theories of morphological processing (eg Burani amp Caramazza1987 Marslen-Wilson Tyler Waksler amp Older 1994 Pinker 1991Prasada amp Pinker 1993 Schreuder amp Baayen 1995) respond to the factthat morphology is only partially systematic by handling the systematic andidiosyncratic aspects separately Idiosyncratic words are stored asunanalysed wholes whereas systematic words are represented andprocessed compositionally in terms of stems and afxes Recognition ofthe latter thus involves decomposing a complex word into its identiablestem and afx and then accessing the stored stem (Marslen-Wilson et al1994) Such dual-mechanism theories run into problems however becausemorphological structure is not just partial but gradedmdashthat is to say thereare many intermediate cases that are neither completely transparent(regular) nor completely opaque (irregular) Rather there are sub-regularities among the irregulars and between them and regulars Forexample the spoken forms of irregular verbs that do not change betweenpresent and past tense (eg HIT SPREAD) all end in either t or dmdashthesame phonemes that occur at the end of regular past-tense forms Thusthese items are not arbitrary exceptions the fact that they already sharethe phonological structure of a past-tense form contributes to theirpatterning Similarly a word like DRESSER is neither fully semanticallytransparent nor fully opaquemdashalthough a DRESSER is not someone whodresses but a piece of furniture that holds clothes it is clearly related to theactivity of dressing More generally the isolation of idiosyncratic fromsystematic knowledge which is at the core of dual-mechanism theories andwhich seems necessary to prevent the exceptions from compromisinggeneralisation to novel forms ends up being a serious handicap in the faceof the rich graded structure of language

These difculties have prompted many researchers to explore analternative way of expressing language knowledge based on connectionistor neural network modelling that can handle graded degrees ofsystematicity more naturally Connectionist models attempt to capturethe essential properties of the neural mechanisms that give rise tobehaviour by implementing cognitive processes in terms of cooperative

448 PLAUT AND GONNERMAN

and competitive interactions among large groups of simple neuron-likeunits (see McClelland Rumelhart amp PDP Research Group 1986McLeod Plunkett amp Rolls 1998 Quinlan 1991 Rumelhart McClellandamp PDP Research Group 1986b)

From a connectionist perspective morphology is a characterisation ofthe learned mapping between the surface forms of words (orthographyphonology) and their meanings (semantics) (see also Cottrell amp Plunkett1995 Gonnerman Devlin Andersen amp Seidenberg 2000b Hoeffner ampMcClelland 1993 Joanisse amp Seidenberg 1999 Rueckl MikolinskiRaveh Miner amp Mars 1997 Rueckl amp Raveh 1999) To the extent thata particular surface pattern occurs in many words and maps consistently tocertain aspects of meaning the internal representations will come to reectthis structure and treat the pattern componentiallymdashthat is represent andprocess it relatively independently of the other parts of the word (Plaut ampMcClelland 1993 Plaut McClelland Seidenberg amp Patterson 1996) Assurface patterns behave less systematically their internal representationswill be correspondingly less componential At the extreme end if an itemviolates what is otherwise a systematic aspect of the domain (eg CORN ER)its internal representation must be very non-componential in order toavoid being given a transparent interpretation (eg someone who CORN S)

To make the connectionist approach to morphology more concreteconsider the framework for lexical processing depicted in Figure 1 (seealso Gonnerman et al 2000b Joanisse amp Seidenberg 1999) Theorthography phonology and semantics of words are represented as

Figure 1 A connectionist framework for lexical processing The large arrows depict inputsand outputs of the system

A CONNECTIONIST APPROACH TO MORPHOLOGY 449

distributed patterns of activity over separate groups of processing unitsWithin each of these domains similar words are represented by similar(overlapping) patterns of activity For example CAT and DOG would beencoded by similar patterns in semantics but by dissimilar patterns inorthography and phonology whereas CAT and CAP would be similar inorthography and phonology but dissimilar in semantics Lexical tasksinvolve transformations between these patternsmdashfor example writtenword comprehension requires that the orthographic pattern for a wordgenerate the appropriate semantic pattern Such transformations areaccomplished via interactions among the units including additionalintermediate or hidden units that mediate between the other groups ofunits1 Unit interactions are governed by weighted excitatory andinhibitory connections between them When the network is presentedwith input (eg the orthography of CAT) units repeatedly update theiractivations as a function of the total input they receive via connectionsfrom other units until the network as a whole settles into a stable patternof activity constituting its response to the input (eg the meaning andpronunciation of CAT) The behaviour of the network in response to a giveninput thus depends on the specic values of the connection weights whichcollectively encode the knowledge of the system The network learns thesevalues by gradually adjusting the weights to improve performance basedon exposure to written words spoken words and their meanings Thislearning has the effect of inducing new internal representations over theintermediate units that are effective in accomplishing the relevant tasks

At a general level the reliance within the connectionist approach onlearning internal representations is in sharp contrast to more traditionalapproaches to language which typically make very strong assumptionsabout the structure of internal representations and the processes thatmanipulate them For example it is often assumed that underlyinglinguistic knowledge takes the form of explicit rules which operate overdiscrete symbolic representations (Chomsky 1957 Chomsky amp Halle1968 Fodor amp Pylyshyn 1988 Pinker 1991 Prasada amp Pinker 1993) andmoreover that this knowledge is in large part innately specied(Chomsky 1965 Crain 1991 Pinker 1994) By contrast rather thanstipulating the specic form and content of the knowledge required forperformance in a domain the connectionist approach instead species the

1 Readers may note that the current framework differs from the lsquolsquotrianglersquorsquo model of wordreading (eg Harm amp Seidenberg 1999 Plaut et al 1996 Seidenberg amp McClelland 1989) inhaving one rather than three groups of hidden units mediating among orthographyphonology and semantics Although this distinction is a matter of ongoing research Kelloand Plaut (in press) report strategic effects in word reading that seem problematic for thetriangle framework but can be simulated effectively by a model with a common group ofhidden units (Kello amp Plaut 1998 in preparation)

450 PLAUT AND GONNERMAN

tasks the system must perform but then leaves it up to the learning processto develop the necessary internal representations and processes (McClel-land St John amp Taraban 1989) The result is often that learning developsrepresentations and processes which are radically different from thoseproposed by traditional theories resulting in novel hypotheses and testablepredictions concerning human cognitive behaviour

In particular the connectionist approach to morphology makes thestrong prediction that the magnitudes of behavioural effects that reectmorphological processing should vary continuously as a function of thedegree of semantic and phonological transparency of words Thisprediction was recently conrmed in English by Gonnerman Andersenand Seidenberg (2000a) who found that prime-target pairs withintermediate degrees of semantic or phonological transparency producedintermediate degrees of priming

Other empirical ndings however would seem to challenge theconnectionist claim that morphology is derived from learning amongorthography phonology and semantics In particular it seems difcult toreconcile this perspective with demonstrations of morphological effectsthat are independent of both semantic and surface similarity Many ofthese demonstrations are in languages such as Hebrew that have muchricher morphological systems than English For example morphologicalpriming in Hebrew can occur in the absence of semantic priming or can beinsensitive to variations in semantic relatedness when surface similarity iscontrolled (Bentin amp Feldman 1990 Frost Deutsch amp Forster in press-aFrost Deutsch Gilboa Tannenbaum amp Marslen-Wilson in press-b FrostForster amp Deutsch 1997) It thus appears that the existence of primingunder these conditions is governed by morphological relatedness per seand not by semantic or surface similarity If according to the connectionistapproach morphological structure ultimately derives from gradedsystematicity among surface forms of words and their meanings therewould seem to be no basis for morphological effects that are independentof semantic and surface similarity

The current article reports on two simulations designed to evaluatewhether the existence of morphological priming in the absence of semanticsimilarity in morphologically rich languages is inconsistent with adistributed connectionist approach to morphology A common set ofprimes and targets ranging in transparency were embedded among alarger set of words that were either transparent or opaque forming twoarticial languages a morphologically rich language analogous to Hebrewand a morphologically impoverished language analogous to English Anetwork was then trained to map orthography to semantics for eitherlanguage and then tested for priming Both languages yielded morpho-logical priming for the more transparent targets but priming extended to

A CONNECTIONIST APPROACH TO MORPHOLOGY 451

semantically opaque targets only in the rich language The results suggestthat non-semantic morphological priming is in fact compatible with theconnectionist approach but that its occurrence is predicted to depend onthe degree of morphological richness of the language

The next section reviews the relevant empirical ndings that support orchallenge a connectionist approach to morphology We then discuss thecomputational principles that are essential for understanding both thegeneral approach and the basis for the results of the simulations whichfollow We conclude with a more general discussion of the strengths andlimitations of the current work as well as remaining challenges to theconnectionist approach

OVERVIEW OF RELEVANT EMPIRICAL FINDINGS

One of the earliest studies on the role of morphology in lexical processingwas conducted by Taft and Forster (1975) Using a lexical decision taskthey found that subjects were able to reject nonwords with pseudo-stems(eg DEPERTOIRE) faster than nonwords with real stems (eg DEJUVEN ATE)Based on these ndings Taft and Forster argued for a processing strategythat rst strips afxes from complex words and then searches for a storedstem representation Many subsequent studies using a variety ofexperimental techniques have examined the extent to which morpholo-gical information is used to decompose complex words in lexicalprocessing (see Feldman 1995) These studies have produced conictingresults leading to disagreement about the precise nature and locus ofmorphological decomposition For example Andrews (1986) argued thatcompound words are decomposed in lexical access sufxed words areoptionally decomposed in access and all types of complex words arerepresented as whole forms Alternatively Stanners Neiser Hernon andHall (1979) claimed that derived forms are processed as wholes butinected forms are decomposed Another account proposes that prexedwords are processed as whole forms but that sufxed words aredecomposed (Cole Beauvillain amp Segui 1989) Yet another approachadvocates two types of lexical access proceduresmdashwhole word anddecompositionalmdashwhich are activated in parallel (eg Burani Salmasoamp Caramazza 1984 Laudanna Badecker amp Caramazza 1989 1992Laudanna Cermele amp Caramazza 1997) While there has been noconsensus as to exactly when and where morphological processing occursthe majority of researchers agree that some form of discrete all-or-nonedecomposition based on morphological structure operates somewherewithin the lexical system

452 PLAUT AND GONNERMAN

The connectionist approach by contrast holds that decomposition is notan all-or-none phenomenon and that behavioural effects should be gradedreecting the degree of convergence among semantic phonological andorthographic codes One source of evidence for continuous gradedsimilarity structure in the semantic relatedness of morphologically complexwords and their stems comes from subjectsrsquo similarity judgements(Gonnermann 1999 Gonnerman et al 2000a) Gonnerman andcolleagues found that subjects produced highly consistent ratings that fellon a continuum judging some word pairs to be unrelated (eg CORNERndashCORN) some moderately related (eg DRESSERndashDRESS) and some highlyrelated (eg TEACHERndashTEACH) With regard to formal similarity Rueckl etal (1997) present ndings from both masked and standard fragmentcompletion tasks in which priming effects for morphologically related wordpairs varied in magnitude as a function of orthographic similarity

Further empirical support for the connectionist perspective comes froma series of cross-modal priming studies in which subjects made lexicaldecisions to written targets preceded by auditorily presented primes(Gonnerman 1999 Gonnerman et al 2000a see also Marslen-Wilson etal 1994) Gonnerman and colleagues compared the magnitude ofpriming as a function of the degree of semantic transparency betweenprime and target (eg high BREAKAGEndashBREAK intermediate SHORTAGEndashSHORT low MESSAGEndashMESS) Crucially all of the items were phonologi-cally transparent such that the stems were contained unchanged in thecomplex words Gonnerman and colleagues found that prime-target pairswith intermediate degrees of transparency also produced intermediatedegrees of priming Thus highly semantically-related word pairs (egBAKERndashBAKE) primed twice as much (40 vs 19 ms) as moderately relatedpairs (eg DRESSERndashDRESS) and unrelated pairs (eg CORNERndashCORN)showed no priming effects Strikingly similar results were also obtainedfor pairs of prexed words and their stems highly related pairs (egPREHEATndashHEAT) primed twice as much (42 vs 20 ms) as moderatelyrelated pairs (eg MIDSTREAMndashSTREAM) whereas unrelated pairs (egREHEARSEndashHEARSE) showed no priming Again as with the sufxed-stempairs phonological relatedness was held constant A third experimentshowed that these graded effects hold for derived-derived word pairsHighly related pairs (eg SAINTLYndashSAINTHOOD) produced signicantfacilitation (34 ms) whereas less related pairs (eg OBSERVATIONndashOBSERVANT) produced only moderate facilitatory effects (14 ms) thatfailed to reach signicance

Gonnerman et al (2000a) also found evidence for graded effects ofphonological relatedness among word pairs that were equated on measuresof semantic similarity Thus priming was greater among phonologicallyrelated pairs involving a consonant change (eg DELETIONndashDELETE 65 ms)

A CONNECTIONIST APPROACH TO MORPHOLOGY 453

than among pairs involving a vowel change (eg VAN ITYndashVAIN 48 ms)which in turn was greater than priming among pairs involving both voweland consonant changes (eg INTRODUCTIONndashINTRODUCE 35 ms)

In two nal experiments Gonnerman et al (2000a) found similarpriming effects among word pairs that are both semantically andphonologically similar but morphologically unrelated In one experimentsignicant priming (24 ms) was demonstrated for lsquopseudo-sufxedrsquo wordsthat do not share morphological stems (eg TRIVIALndashTRIndash E) but cruciallyonly if the words overlapped sufciently in both meaning and sound Datafrom a second experiment indicated priming (35 ms) for words taken fromsemantic and phonological clusters where there are no morphologicalrelationships (eg SNARLndashSN EER) Again priming occurred only if therewas sufcient overlap in both meaning and sound SW ITCHndashSW IM pairs didnot prime These ndings conrm that priming effects similar to thosefound for morphologically related items hold for word pairs that arecorrelated in meaning and sound but have no morphological relationships(see also Rueckl amp Dror 1994 for a related demonstration that subjectslearn an artical vocabulary better if formal and semantic similarity arecorrelated)

Thus Gonnerman et al (2000a) provided clear evidence for gradedeffects of similarity in semantics phonology and orthography on cross-modal priming Furthermore the graded semantic effects were shown togeneralise across morphological types holding for morphologically relatedsufxed-stem prexed-stem and sufxed-sufxed pairs The effects ofgraded semantic and phonological similarity were demonstrated even formorphologically unrelated word pairs Taken together these resultssupport a distributed connectionist approach where morphology char-acterises the regularities in the learned mapping between semantics andphonology

There are of course ndings that seem to challenge the notion thatmorphological effects can be reduced entirely to the joint inuence ofsemantic and formal similarity For example with regard to formalsimilarity Murrell and Morton (1974) found priming among formallyrelated word pairs only if they were morphologically related (eg CARSndashCAR but not CARDndashCAR) and concluded that priming is determined bymorphological structure rather than by formal relatedness per se Withregard to semantics Kempley and Morton (1982) found that regularlyinected word pairs (eg REndash ECTEDndashREndash ECTING) produced signicantfacilitation whereas irregularly inected pairs (eg HELDndashHOLDING) didnot They argued on the basis of these results that semantic properties ofwords do not inuence processing whereas morphology does Manyadditional studies have come to a similar conclusion that morphologicalstructure and not semantic or formal properties underlies lexical

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 4: Are non-semantic morphological effects incompatible with a ...

448 PLAUT AND GONNERMAN

and competitive interactions among large groups of simple neuron-likeunits (see McClelland Rumelhart amp PDP Research Group 1986McLeod Plunkett amp Rolls 1998 Quinlan 1991 Rumelhart McClellandamp PDP Research Group 1986b)

From a connectionist perspective morphology is a characterisation ofthe learned mapping between the surface forms of words (orthographyphonology) and their meanings (semantics) (see also Cottrell amp Plunkett1995 Gonnerman Devlin Andersen amp Seidenberg 2000b Hoeffner ampMcClelland 1993 Joanisse amp Seidenberg 1999 Rueckl MikolinskiRaveh Miner amp Mars 1997 Rueckl amp Raveh 1999) To the extent thata particular surface pattern occurs in many words and maps consistently tocertain aspects of meaning the internal representations will come to reectthis structure and treat the pattern componentiallymdashthat is represent andprocess it relatively independently of the other parts of the word (Plaut ampMcClelland 1993 Plaut McClelland Seidenberg amp Patterson 1996) Assurface patterns behave less systematically their internal representationswill be correspondingly less componential At the extreme end if an itemviolates what is otherwise a systematic aspect of the domain (eg CORN ER)its internal representation must be very non-componential in order toavoid being given a transparent interpretation (eg someone who CORN S)

To make the connectionist approach to morphology more concreteconsider the framework for lexical processing depicted in Figure 1 (seealso Gonnerman et al 2000b Joanisse amp Seidenberg 1999) Theorthography phonology and semantics of words are represented as

Figure 1 A connectionist framework for lexical processing The large arrows depict inputsand outputs of the system

A CONNECTIONIST APPROACH TO MORPHOLOGY 449

distributed patterns of activity over separate groups of processing unitsWithin each of these domains similar words are represented by similar(overlapping) patterns of activity For example CAT and DOG would beencoded by similar patterns in semantics but by dissimilar patterns inorthography and phonology whereas CAT and CAP would be similar inorthography and phonology but dissimilar in semantics Lexical tasksinvolve transformations between these patternsmdashfor example writtenword comprehension requires that the orthographic pattern for a wordgenerate the appropriate semantic pattern Such transformations areaccomplished via interactions among the units including additionalintermediate or hidden units that mediate between the other groups ofunits1 Unit interactions are governed by weighted excitatory andinhibitory connections between them When the network is presentedwith input (eg the orthography of CAT) units repeatedly update theiractivations as a function of the total input they receive via connectionsfrom other units until the network as a whole settles into a stable patternof activity constituting its response to the input (eg the meaning andpronunciation of CAT) The behaviour of the network in response to a giveninput thus depends on the specic values of the connection weights whichcollectively encode the knowledge of the system The network learns thesevalues by gradually adjusting the weights to improve performance basedon exposure to written words spoken words and their meanings Thislearning has the effect of inducing new internal representations over theintermediate units that are effective in accomplishing the relevant tasks

At a general level the reliance within the connectionist approach onlearning internal representations is in sharp contrast to more traditionalapproaches to language which typically make very strong assumptionsabout the structure of internal representations and the processes thatmanipulate them For example it is often assumed that underlyinglinguistic knowledge takes the form of explicit rules which operate overdiscrete symbolic representations (Chomsky 1957 Chomsky amp Halle1968 Fodor amp Pylyshyn 1988 Pinker 1991 Prasada amp Pinker 1993) andmoreover that this knowledge is in large part innately specied(Chomsky 1965 Crain 1991 Pinker 1994) By contrast rather thanstipulating the specic form and content of the knowledge required forperformance in a domain the connectionist approach instead species the

1 Readers may note that the current framework differs from the lsquolsquotrianglersquorsquo model of wordreading (eg Harm amp Seidenberg 1999 Plaut et al 1996 Seidenberg amp McClelland 1989) inhaving one rather than three groups of hidden units mediating among orthographyphonology and semantics Although this distinction is a matter of ongoing research Kelloand Plaut (in press) report strategic effects in word reading that seem problematic for thetriangle framework but can be simulated effectively by a model with a common group ofhidden units (Kello amp Plaut 1998 in preparation)

450 PLAUT AND GONNERMAN

tasks the system must perform but then leaves it up to the learning processto develop the necessary internal representations and processes (McClel-land St John amp Taraban 1989) The result is often that learning developsrepresentations and processes which are radically different from thoseproposed by traditional theories resulting in novel hypotheses and testablepredictions concerning human cognitive behaviour

In particular the connectionist approach to morphology makes thestrong prediction that the magnitudes of behavioural effects that reectmorphological processing should vary continuously as a function of thedegree of semantic and phonological transparency of words Thisprediction was recently conrmed in English by Gonnerman Andersenand Seidenberg (2000a) who found that prime-target pairs withintermediate degrees of semantic or phonological transparency producedintermediate degrees of priming

Other empirical ndings however would seem to challenge theconnectionist claim that morphology is derived from learning amongorthography phonology and semantics In particular it seems difcult toreconcile this perspective with demonstrations of morphological effectsthat are independent of both semantic and surface similarity Many ofthese demonstrations are in languages such as Hebrew that have muchricher morphological systems than English For example morphologicalpriming in Hebrew can occur in the absence of semantic priming or can beinsensitive to variations in semantic relatedness when surface similarity iscontrolled (Bentin amp Feldman 1990 Frost Deutsch amp Forster in press-aFrost Deutsch Gilboa Tannenbaum amp Marslen-Wilson in press-b FrostForster amp Deutsch 1997) It thus appears that the existence of primingunder these conditions is governed by morphological relatedness per seand not by semantic or surface similarity If according to the connectionistapproach morphological structure ultimately derives from gradedsystematicity among surface forms of words and their meanings therewould seem to be no basis for morphological effects that are independentof semantic and surface similarity

The current article reports on two simulations designed to evaluatewhether the existence of morphological priming in the absence of semanticsimilarity in morphologically rich languages is inconsistent with adistributed connectionist approach to morphology A common set ofprimes and targets ranging in transparency were embedded among alarger set of words that were either transparent or opaque forming twoarticial languages a morphologically rich language analogous to Hebrewand a morphologically impoverished language analogous to English Anetwork was then trained to map orthography to semantics for eitherlanguage and then tested for priming Both languages yielded morpho-logical priming for the more transparent targets but priming extended to

A CONNECTIONIST APPROACH TO MORPHOLOGY 451

semantically opaque targets only in the rich language The results suggestthat non-semantic morphological priming is in fact compatible with theconnectionist approach but that its occurrence is predicted to depend onthe degree of morphological richness of the language

The next section reviews the relevant empirical ndings that support orchallenge a connectionist approach to morphology We then discuss thecomputational principles that are essential for understanding both thegeneral approach and the basis for the results of the simulations whichfollow We conclude with a more general discussion of the strengths andlimitations of the current work as well as remaining challenges to theconnectionist approach

OVERVIEW OF RELEVANT EMPIRICAL FINDINGS

One of the earliest studies on the role of morphology in lexical processingwas conducted by Taft and Forster (1975) Using a lexical decision taskthey found that subjects were able to reject nonwords with pseudo-stems(eg DEPERTOIRE) faster than nonwords with real stems (eg DEJUVEN ATE)Based on these ndings Taft and Forster argued for a processing strategythat rst strips afxes from complex words and then searches for a storedstem representation Many subsequent studies using a variety ofexperimental techniques have examined the extent to which morpholo-gical information is used to decompose complex words in lexicalprocessing (see Feldman 1995) These studies have produced conictingresults leading to disagreement about the precise nature and locus ofmorphological decomposition For example Andrews (1986) argued thatcompound words are decomposed in lexical access sufxed words areoptionally decomposed in access and all types of complex words arerepresented as whole forms Alternatively Stanners Neiser Hernon andHall (1979) claimed that derived forms are processed as wholes butinected forms are decomposed Another account proposes that prexedwords are processed as whole forms but that sufxed words aredecomposed (Cole Beauvillain amp Segui 1989) Yet another approachadvocates two types of lexical access proceduresmdashwhole word anddecompositionalmdashwhich are activated in parallel (eg Burani Salmasoamp Caramazza 1984 Laudanna Badecker amp Caramazza 1989 1992Laudanna Cermele amp Caramazza 1997) While there has been noconsensus as to exactly when and where morphological processing occursthe majority of researchers agree that some form of discrete all-or-nonedecomposition based on morphological structure operates somewherewithin the lexical system

452 PLAUT AND GONNERMAN

The connectionist approach by contrast holds that decomposition is notan all-or-none phenomenon and that behavioural effects should be gradedreecting the degree of convergence among semantic phonological andorthographic codes One source of evidence for continuous gradedsimilarity structure in the semantic relatedness of morphologically complexwords and their stems comes from subjectsrsquo similarity judgements(Gonnermann 1999 Gonnerman et al 2000a) Gonnerman andcolleagues found that subjects produced highly consistent ratings that fellon a continuum judging some word pairs to be unrelated (eg CORNERndashCORN) some moderately related (eg DRESSERndashDRESS) and some highlyrelated (eg TEACHERndashTEACH) With regard to formal similarity Rueckl etal (1997) present ndings from both masked and standard fragmentcompletion tasks in which priming effects for morphologically related wordpairs varied in magnitude as a function of orthographic similarity

Further empirical support for the connectionist perspective comes froma series of cross-modal priming studies in which subjects made lexicaldecisions to written targets preceded by auditorily presented primes(Gonnerman 1999 Gonnerman et al 2000a see also Marslen-Wilson etal 1994) Gonnerman and colleagues compared the magnitude ofpriming as a function of the degree of semantic transparency betweenprime and target (eg high BREAKAGEndashBREAK intermediate SHORTAGEndashSHORT low MESSAGEndashMESS) Crucially all of the items were phonologi-cally transparent such that the stems were contained unchanged in thecomplex words Gonnerman and colleagues found that prime-target pairswith intermediate degrees of transparency also produced intermediatedegrees of priming Thus highly semantically-related word pairs (egBAKERndashBAKE) primed twice as much (40 vs 19 ms) as moderately relatedpairs (eg DRESSERndashDRESS) and unrelated pairs (eg CORNERndashCORN)showed no priming effects Strikingly similar results were also obtainedfor pairs of prexed words and their stems highly related pairs (egPREHEATndashHEAT) primed twice as much (42 vs 20 ms) as moderatelyrelated pairs (eg MIDSTREAMndashSTREAM) whereas unrelated pairs (egREHEARSEndashHEARSE) showed no priming Again as with the sufxed-stempairs phonological relatedness was held constant A third experimentshowed that these graded effects hold for derived-derived word pairsHighly related pairs (eg SAINTLYndashSAINTHOOD) produced signicantfacilitation (34 ms) whereas less related pairs (eg OBSERVATIONndashOBSERVANT) produced only moderate facilitatory effects (14 ms) thatfailed to reach signicance

Gonnerman et al (2000a) also found evidence for graded effects ofphonological relatedness among word pairs that were equated on measuresof semantic similarity Thus priming was greater among phonologicallyrelated pairs involving a consonant change (eg DELETIONndashDELETE 65 ms)

A CONNECTIONIST APPROACH TO MORPHOLOGY 453

than among pairs involving a vowel change (eg VAN ITYndashVAIN 48 ms)which in turn was greater than priming among pairs involving both voweland consonant changes (eg INTRODUCTIONndashINTRODUCE 35 ms)

In two nal experiments Gonnerman et al (2000a) found similarpriming effects among word pairs that are both semantically andphonologically similar but morphologically unrelated In one experimentsignicant priming (24 ms) was demonstrated for lsquopseudo-sufxedrsquo wordsthat do not share morphological stems (eg TRIVIALndashTRIndash E) but cruciallyonly if the words overlapped sufciently in both meaning and sound Datafrom a second experiment indicated priming (35 ms) for words taken fromsemantic and phonological clusters where there are no morphologicalrelationships (eg SNARLndashSN EER) Again priming occurred only if therewas sufcient overlap in both meaning and sound SW ITCHndashSW IM pairs didnot prime These ndings conrm that priming effects similar to thosefound for morphologically related items hold for word pairs that arecorrelated in meaning and sound but have no morphological relationships(see also Rueckl amp Dror 1994 for a related demonstration that subjectslearn an artical vocabulary better if formal and semantic similarity arecorrelated)

Thus Gonnerman et al (2000a) provided clear evidence for gradedeffects of similarity in semantics phonology and orthography on cross-modal priming Furthermore the graded semantic effects were shown togeneralise across morphological types holding for morphologically relatedsufxed-stem prexed-stem and sufxed-sufxed pairs The effects ofgraded semantic and phonological similarity were demonstrated even formorphologically unrelated word pairs Taken together these resultssupport a distributed connectionist approach where morphology char-acterises the regularities in the learned mapping between semantics andphonology

There are of course ndings that seem to challenge the notion thatmorphological effects can be reduced entirely to the joint inuence ofsemantic and formal similarity For example with regard to formalsimilarity Murrell and Morton (1974) found priming among formallyrelated word pairs only if they were morphologically related (eg CARSndashCAR but not CARDndashCAR) and concluded that priming is determined bymorphological structure rather than by formal relatedness per se Withregard to semantics Kempley and Morton (1982) found that regularlyinected word pairs (eg REndash ECTEDndashREndash ECTING) produced signicantfacilitation whereas irregularly inected pairs (eg HELDndashHOLDING) didnot They argued on the basis of these results that semantic properties ofwords do not inuence processing whereas morphology does Manyadditional studies have come to a similar conclusion that morphologicalstructure and not semantic or formal properties underlies lexical

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 5: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 449

distributed patterns of activity over separate groups of processing unitsWithin each of these domains similar words are represented by similar(overlapping) patterns of activity For example CAT and DOG would beencoded by similar patterns in semantics but by dissimilar patterns inorthography and phonology whereas CAT and CAP would be similar inorthography and phonology but dissimilar in semantics Lexical tasksinvolve transformations between these patternsmdashfor example writtenword comprehension requires that the orthographic pattern for a wordgenerate the appropriate semantic pattern Such transformations areaccomplished via interactions among the units including additionalintermediate or hidden units that mediate between the other groups ofunits1 Unit interactions are governed by weighted excitatory andinhibitory connections between them When the network is presentedwith input (eg the orthography of CAT) units repeatedly update theiractivations as a function of the total input they receive via connectionsfrom other units until the network as a whole settles into a stable patternof activity constituting its response to the input (eg the meaning andpronunciation of CAT) The behaviour of the network in response to a giveninput thus depends on the specic values of the connection weights whichcollectively encode the knowledge of the system The network learns thesevalues by gradually adjusting the weights to improve performance basedon exposure to written words spoken words and their meanings Thislearning has the effect of inducing new internal representations over theintermediate units that are effective in accomplishing the relevant tasks

At a general level the reliance within the connectionist approach onlearning internal representations is in sharp contrast to more traditionalapproaches to language which typically make very strong assumptionsabout the structure of internal representations and the processes thatmanipulate them For example it is often assumed that underlyinglinguistic knowledge takes the form of explicit rules which operate overdiscrete symbolic representations (Chomsky 1957 Chomsky amp Halle1968 Fodor amp Pylyshyn 1988 Pinker 1991 Prasada amp Pinker 1993) andmoreover that this knowledge is in large part innately specied(Chomsky 1965 Crain 1991 Pinker 1994) By contrast rather thanstipulating the specic form and content of the knowledge required forperformance in a domain the connectionist approach instead species the

1 Readers may note that the current framework differs from the lsquolsquotrianglersquorsquo model of wordreading (eg Harm amp Seidenberg 1999 Plaut et al 1996 Seidenberg amp McClelland 1989) inhaving one rather than three groups of hidden units mediating among orthographyphonology and semantics Although this distinction is a matter of ongoing research Kelloand Plaut (in press) report strategic effects in word reading that seem problematic for thetriangle framework but can be simulated effectively by a model with a common group ofhidden units (Kello amp Plaut 1998 in preparation)

450 PLAUT AND GONNERMAN

tasks the system must perform but then leaves it up to the learning processto develop the necessary internal representations and processes (McClel-land St John amp Taraban 1989) The result is often that learning developsrepresentations and processes which are radically different from thoseproposed by traditional theories resulting in novel hypotheses and testablepredictions concerning human cognitive behaviour

In particular the connectionist approach to morphology makes thestrong prediction that the magnitudes of behavioural effects that reectmorphological processing should vary continuously as a function of thedegree of semantic and phonological transparency of words Thisprediction was recently conrmed in English by Gonnerman Andersenand Seidenberg (2000a) who found that prime-target pairs withintermediate degrees of semantic or phonological transparency producedintermediate degrees of priming

Other empirical ndings however would seem to challenge theconnectionist claim that morphology is derived from learning amongorthography phonology and semantics In particular it seems difcult toreconcile this perspective with demonstrations of morphological effectsthat are independent of both semantic and surface similarity Many ofthese demonstrations are in languages such as Hebrew that have muchricher morphological systems than English For example morphologicalpriming in Hebrew can occur in the absence of semantic priming or can beinsensitive to variations in semantic relatedness when surface similarity iscontrolled (Bentin amp Feldman 1990 Frost Deutsch amp Forster in press-aFrost Deutsch Gilboa Tannenbaum amp Marslen-Wilson in press-b FrostForster amp Deutsch 1997) It thus appears that the existence of primingunder these conditions is governed by morphological relatedness per seand not by semantic or surface similarity If according to the connectionistapproach morphological structure ultimately derives from gradedsystematicity among surface forms of words and their meanings therewould seem to be no basis for morphological effects that are independentof semantic and surface similarity

The current article reports on two simulations designed to evaluatewhether the existence of morphological priming in the absence of semanticsimilarity in morphologically rich languages is inconsistent with adistributed connectionist approach to morphology A common set ofprimes and targets ranging in transparency were embedded among alarger set of words that were either transparent or opaque forming twoarticial languages a morphologically rich language analogous to Hebrewand a morphologically impoverished language analogous to English Anetwork was then trained to map orthography to semantics for eitherlanguage and then tested for priming Both languages yielded morpho-logical priming for the more transparent targets but priming extended to

A CONNECTIONIST APPROACH TO MORPHOLOGY 451

semantically opaque targets only in the rich language The results suggestthat non-semantic morphological priming is in fact compatible with theconnectionist approach but that its occurrence is predicted to depend onthe degree of morphological richness of the language

The next section reviews the relevant empirical ndings that support orchallenge a connectionist approach to morphology We then discuss thecomputational principles that are essential for understanding both thegeneral approach and the basis for the results of the simulations whichfollow We conclude with a more general discussion of the strengths andlimitations of the current work as well as remaining challenges to theconnectionist approach

OVERVIEW OF RELEVANT EMPIRICAL FINDINGS

One of the earliest studies on the role of morphology in lexical processingwas conducted by Taft and Forster (1975) Using a lexical decision taskthey found that subjects were able to reject nonwords with pseudo-stems(eg DEPERTOIRE) faster than nonwords with real stems (eg DEJUVEN ATE)Based on these ndings Taft and Forster argued for a processing strategythat rst strips afxes from complex words and then searches for a storedstem representation Many subsequent studies using a variety ofexperimental techniques have examined the extent to which morpholo-gical information is used to decompose complex words in lexicalprocessing (see Feldman 1995) These studies have produced conictingresults leading to disagreement about the precise nature and locus ofmorphological decomposition For example Andrews (1986) argued thatcompound words are decomposed in lexical access sufxed words areoptionally decomposed in access and all types of complex words arerepresented as whole forms Alternatively Stanners Neiser Hernon andHall (1979) claimed that derived forms are processed as wholes butinected forms are decomposed Another account proposes that prexedwords are processed as whole forms but that sufxed words aredecomposed (Cole Beauvillain amp Segui 1989) Yet another approachadvocates two types of lexical access proceduresmdashwhole word anddecompositionalmdashwhich are activated in parallel (eg Burani Salmasoamp Caramazza 1984 Laudanna Badecker amp Caramazza 1989 1992Laudanna Cermele amp Caramazza 1997) While there has been noconsensus as to exactly when and where morphological processing occursthe majority of researchers agree that some form of discrete all-or-nonedecomposition based on morphological structure operates somewherewithin the lexical system

452 PLAUT AND GONNERMAN

The connectionist approach by contrast holds that decomposition is notan all-or-none phenomenon and that behavioural effects should be gradedreecting the degree of convergence among semantic phonological andorthographic codes One source of evidence for continuous gradedsimilarity structure in the semantic relatedness of morphologically complexwords and their stems comes from subjectsrsquo similarity judgements(Gonnermann 1999 Gonnerman et al 2000a) Gonnerman andcolleagues found that subjects produced highly consistent ratings that fellon a continuum judging some word pairs to be unrelated (eg CORNERndashCORN) some moderately related (eg DRESSERndashDRESS) and some highlyrelated (eg TEACHERndashTEACH) With regard to formal similarity Rueckl etal (1997) present ndings from both masked and standard fragmentcompletion tasks in which priming effects for morphologically related wordpairs varied in magnitude as a function of orthographic similarity

Further empirical support for the connectionist perspective comes froma series of cross-modal priming studies in which subjects made lexicaldecisions to written targets preceded by auditorily presented primes(Gonnerman 1999 Gonnerman et al 2000a see also Marslen-Wilson etal 1994) Gonnerman and colleagues compared the magnitude ofpriming as a function of the degree of semantic transparency betweenprime and target (eg high BREAKAGEndashBREAK intermediate SHORTAGEndashSHORT low MESSAGEndashMESS) Crucially all of the items were phonologi-cally transparent such that the stems were contained unchanged in thecomplex words Gonnerman and colleagues found that prime-target pairswith intermediate degrees of transparency also produced intermediatedegrees of priming Thus highly semantically-related word pairs (egBAKERndashBAKE) primed twice as much (40 vs 19 ms) as moderately relatedpairs (eg DRESSERndashDRESS) and unrelated pairs (eg CORNERndashCORN)showed no priming effects Strikingly similar results were also obtainedfor pairs of prexed words and their stems highly related pairs (egPREHEATndashHEAT) primed twice as much (42 vs 20 ms) as moderatelyrelated pairs (eg MIDSTREAMndashSTREAM) whereas unrelated pairs (egREHEARSEndashHEARSE) showed no priming Again as with the sufxed-stempairs phonological relatedness was held constant A third experimentshowed that these graded effects hold for derived-derived word pairsHighly related pairs (eg SAINTLYndashSAINTHOOD) produced signicantfacilitation (34 ms) whereas less related pairs (eg OBSERVATIONndashOBSERVANT) produced only moderate facilitatory effects (14 ms) thatfailed to reach signicance

Gonnerman et al (2000a) also found evidence for graded effects ofphonological relatedness among word pairs that were equated on measuresof semantic similarity Thus priming was greater among phonologicallyrelated pairs involving a consonant change (eg DELETIONndashDELETE 65 ms)

A CONNECTIONIST APPROACH TO MORPHOLOGY 453

than among pairs involving a vowel change (eg VAN ITYndashVAIN 48 ms)which in turn was greater than priming among pairs involving both voweland consonant changes (eg INTRODUCTIONndashINTRODUCE 35 ms)

In two nal experiments Gonnerman et al (2000a) found similarpriming effects among word pairs that are both semantically andphonologically similar but morphologically unrelated In one experimentsignicant priming (24 ms) was demonstrated for lsquopseudo-sufxedrsquo wordsthat do not share morphological stems (eg TRIVIALndashTRIndash E) but cruciallyonly if the words overlapped sufciently in both meaning and sound Datafrom a second experiment indicated priming (35 ms) for words taken fromsemantic and phonological clusters where there are no morphologicalrelationships (eg SNARLndashSN EER) Again priming occurred only if therewas sufcient overlap in both meaning and sound SW ITCHndashSW IM pairs didnot prime These ndings conrm that priming effects similar to thosefound for morphologically related items hold for word pairs that arecorrelated in meaning and sound but have no morphological relationships(see also Rueckl amp Dror 1994 for a related demonstration that subjectslearn an artical vocabulary better if formal and semantic similarity arecorrelated)

Thus Gonnerman et al (2000a) provided clear evidence for gradedeffects of similarity in semantics phonology and orthography on cross-modal priming Furthermore the graded semantic effects were shown togeneralise across morphological types holding for morphologically relatedsufxed-stem prexed-stem and sufxed-sufxed pairs The effects ofgraded semantic and phonological similarity were demonstrated even formorphologically unrelated word pairs Taken together these resultssupport a distributed connectionist approach where morphology char-acterises the regularities in the learned mapping between semantics andphonology

There are of course ndings that seem to challenge the notion thatmorphological effects can be reduced entirely to the joint inuence ofsemantic and formal similarity For example with regard to formalsimilarity Murrell and Morton (1974) found priming among formallyrelated word pairs only if they were morphologically related (eg CARSndashCAR but not CARDndashCAR) and concluded that priming is determined bymorphological structure rather than by formal relatedness per se Withregard to semantics Kempley and Morton (1982) found that regularlyinected word pairs (eg REndash ECTEDndashREndash ECTING) produced signicantfacilitation whereas irregularly inected pairs (eg HELDndashHOLDING) didnot They argued on the basis of these results that semantic properties ofwords do not inuence processing whereas morphology does Manyadditional studies have come to a similar conclusion that morphologicalstructure and not semantic or formal properties underlies lexical

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 6: Are non-semantic morphological effects incompatible with a ...

450 PLAUT AND GONNERMAN

tasks the system must perform but then leaves it up to the learning processto develop the necessary internal representations and processes (McClel-land St John amp Taraban 1989) The result is often that learning developsrepresentations and processes which are radically different from thoseproposed by traditional theories resulting in novel hypotheses and testablepredictions concerning human cognitive behaviour

In particular the connectionist approach to morphology makes thestrong prediction that the magnitudes of behavioural effects that reectmorphological processing should vary continuously as a function of thedegree of semantic and phonological transparency of words Thisprediction was recently conrmed in English by Gonnerman Andersenand Seidenberg (2000a) who found that prime-target pairs withintermediate degrees of semantic or phonological transparency producedintermediate degrees of priming

Other empirical ndings however would seem to challenge theconnectionist claim that morphology is derived from learning amongorthography phonology and semantics In particular it seems difcult toreconcile this perspective with demonstrations of morphological effectsthat are independent of both semantic and surface similarity Many ofthese demonstrations are in languages such as Hebrew that have muchricher morphological systems than English For example morphologicalpriming in Hebrew can occur in the absence of semantic priming or can beinsensitive to variations in semantic relatedness when surface similarity iscontrolled (Bentin amp Feldman 1990 Frost Deutsch amp Forster in press-aFrost Deutsch Gilboa Tannenbaum amp Marslen-Wilson in press-b FrostForster amp Deutsch 1997) It thus appears that the existence of primingunder these conditions is governed by morphological relatedness per seand not by semantic or surface similarity If according to the connectionistapproach morphological structure ultimately derives from gradedsystematicity among surface forms of words and their meanings therewould seem to be no basis for morphological effects that are independentof semantic and surface similarity

The current article reports on two simulations designed to evaluatewhether the existence of morphological priming in the absence of semanticsimilarity in morphologically rich languages is inconsistent with adistributed connectionist approach to morphology A common set ofprimes and targets ranging in transparency were embedded among alarger set of words that were either transparent or opaque forming twoarticial languages a morphologically rich language analogous to Hebrewand a morphologically impoverished language analogous to English Anetwork was then trained to map orthography to semantics for eitherlanguage and then tested for priming Both languages yielded morpho-logical priming for the more transparent targets but priming extended to

A CONNECTIONIST APPROACH TO MORPHOLOGY 451

semantically opaque targets only in the rich language The results suggestthat non-semantic morphological priming is in fact compatible with theconnectionist approach but that its occurrence is predicted to depend onthe degree of morphological richness of the language

The next section reviews the relevant empirical ndings that support orchallenge a connectionist approach to morphology We then discuss thecomputational principles that are essential for understanding both thegeneral approach and the basis for the results of the simulations whichfollow We conclude with a more general discussion of the strengths andlimitations of the current work as well as remaining challenges to theconnectionist approach

OVERVIEW OF RELEVANT EMPIRICAL FINDINGS

One of the earliest studies on the role of morphology in lexical processingwas conducted by Taft and Forster (1975) Using a lexical decision taskthey found that subjects were able to reject nonwords with pseudo-stems(eg DEPERTOIRE) faster than nonwords with real stems (eg DEJUVEN ATE)Based on these ndings Taft and Forster argued for a processing strategythat rst strips afxes from complex words and then searches for a storedstem representation Many subsequent studies using a variety ofexperimental techniques have examined the extent to which morpholo-gical information is used to decompose complex words in lexicalprocessing (see Feldman 1995) These studies have produced conictingresults leading to disagreement about the precise nature and locus ofmorphological decomposition For example Andrews (1986) argued thatcompound words are decomposed in lexical access sufxed words areoptionally decomposed in access and all types of complex words arerepresented as whole forms Alternatively Stanners Neiser Hernon andHall (1979) claimed that derived forms are processed as wholes butinected forms are decomposed Another account proposes that prexedwords are processed as whole forms but that sufxed words aredecomposed (Cole Beauvillain amp Segui 1989) Yet another approachadvocates two types of lexical access proceduresmdashwhole word anddecompositionalmdashwhich are activated in parallel (eg Burani Salmasoamp Caramazza 1984 Laudanna Badecker amp Caramazza 1989 1992Laudanna Cermele amp Caramazza 1997) While there has been noconsensus as to exactly when and where morphological processing occursthe majority of researchers agree that some form of discrete all-or-nonedecomposition based on morphological structure operates somewherewithin the lexical system

452 PLAUT AND GONNERMAN

The connectionist approach by contrast holds that decomposition is notan all-or-none phenomenon and that behavioural effects should be gradedreecting the degree of convergence among semantic phonological andorthographic codes One source of evidence for continuous gradedsimilarity structure in the semantic relatedness of morphologically complexwords and their stems comes from subjectsrsquo similarity judgements(Gonnermann 1999 Gonnerman et al 2000a) Gonnerman andcolleagues found that subjects produced highly consistent ratings that fellon a continuum judging some word pairs to be unrelated (eg CORNERndashCORN) some moderately related (eg DRESSERndashDRESS) and some highlyrelated (eg TEACHERndashTEACH) With regard to formal similarity Rueckl etal (1997) present ndings from both masked and standard fragmentcompletion tasks in which priming effects for morphologically related wordpairs varied in magnitude as a function of orthographic similarity

Further empirical support for the connectionist perspective comes froma series of cross-modal priming studies in which subjects made lexicaldecisions to written targets preceded by auditorily presented primes(Gonnerman 1999 Gonnerman et al 2000a see also Marslen-Wilson etal 1994) Gonnerman and colleagues compared the magnitude ofpriming as a function of the degree of semantic transparency betweenprime and target (eg high BREAKAGEndashBREAK intermediate SHORTAGEndashSHORT low MESSAGEndashMESS) Crucially all of the items were phonologi-cally transparent such that the stems were contained unchanged in thecomplex words Gonnerman and colleagues found that prime-target pairswith intermediate degrees of transparency also produced intermediatedegrees of priming Thus highly semantically-related word pairs (egBAKERndashBAKE) primed twice as much (40 vs 19 ms) as moderately relatedpairs (eg DRESSERndashDRESS) and unrelated pairs (eg CORNERndashCORN)showed no priming effects Strikingly similar results were also obtainedfor pairs of prexed words and their stems highly related pairs (egPREHEATndashHEAT) primed twice as much (42 vs 20 ms) as moderatelyrelated pairs (eg MIDSTREAMndashSTREAM) whereas unrelated pairs (egREHEARSEndashHEARSE) showed no priming Again as with the sufxed-stempairs phonological relatedness was held constant A third experimentshowed that these graded effects hold for derived-derived word pairsHighly related pairs (eg SAINTLYndashSAINTHOOD) produced signicantfacilitation (34 ms) whereas less related pairs (eg OBSERVATIONndashOBSERVANT) produced only moderate facilitatory effects (14 ms) thatfailed to reach signicance

Gonnerman et al (2000a) also found evidence for graded effects ofphonological relatedness among word pairs that were equated on measuresof semantic similarity Thus priming was greater among phonologicallyrelated pairs involving a consonant change (eg DELETIONndashDELETE 65 ms)

A CONNECTIONIST APPROACH TO MORPHOLOGY 453

than among pairs involving a vowel change (eg VAN ITYndashVAIN 48 ms)which in turn was greater than priming among pairs involving both voweland consonant changes (eg INTRODUCTIONndashINTRODUCE 35 ms)

In two nal experiments Gonnerman et al (2000a) found similarpriming effects among word pairs that are both semantically andphonologically similar but morphologically unrelated In one experimentsignicant priming (24 ms) was demonstrated for lsquopseudo-sufxedrsquo wordsthat do not share morphological stems (eg TRIVIALndashTRIndash E) but cruciallyonly if the words overlapped sufciently in both meaning and sound Datafrom a second experiment indicated priming (35 ms) for words taken fromsemantic and phonological clusters where there are no morphologicalrelationships (eg SNARLndashSN EER) Again priming occurred only if therewas sufcient overlap in both meaning and sound SW ITCHndashSW IM pairs didnot prime These ndings conrm that priming effects similar to thosefound for morphologically related items hold for word pairs that arecorrelated in meaning and sound but have no morphological relationships(see also Rueckl amp Dror 1994 for a related demonstration that subjectslearn an artical vocabulary better if formal and semantic similarity arecorrelated)

Thus Gonnerman et al (2000a) provided clear evidence for gradedeffects of similarity in semantics phonology and orthography on cross-modal priming Furthermore the graded semantic effects were shown togeneralise across morphological types holding for morphologically relatedsufxed-stem prexed-stem and sufxed-sufxed pairs The effects ofgraded semantic and phonological similarity were demonstrated even formorphologically unrelated word pairs Taken together these resultssupport a distributed connectionist approach where morphology char-acterises the regularities in the learned mapping between semantics andphonology

There are of course ndings that seem to challenge the notion thatmorphological effects can be reduced entirely to the joint inuence ofsemantic and formal similarity For example with regard to formalsimilarity Murrell and Morton (1974) found priming among formallyrelated word pairs only if they were morphologically related (eg CARSndashCAR but not CARDndashCAR) and concluded that priming is determined bymorphological structure rather than by formal relatedness per se Withregard to semantics Kempley and Morton (1982) found that regularlyinected word pairs (eg REndash ECTEDndashREndash ECTING) produced signicantfacilitation whereas irregularly inected pairs (eg HELDndashHOLDING) didnot They argued on the basis of these results that semantic properties ofwords do not inuence processing whereas morphology does Manyadditional studies have come to a similar conclusion that morphologicalstructure and not semantic or formal properties underlies lexical

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 7: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 451

semantically opaque targets only in the rich language The results suggestthat non-semantic morphological priming is in fact compatible with theconnectionist approach but that its occurrence is predicted to depend onthe degree of morphological richness of the language

The next section reviews the relevant empirical ndings that support orchallenge a connectionist approach to morphology We then discuss thecomputational principles that are essential for understanding both thegeneral approach and the basis for the results of the simulations whichfollow We conclude with a more general discussion of the strengths andlimitations of the current work as well as remaining challenges to theconnectionist approach

OVERVIEW OF RELEVANT EMPIRICAL FINDINGS

One of the earliest studies on the role of morphology in lexical processingwas conducted by Taft and Forster (1975) Using a lexical decision taskthey found that subjects were able to reject nonwords with pseudo-stems(eg DEPERTOIRE) faster than nonwords with real stems (eg DEJUVEN ATE)Based on these ndings Taft and Forster argued for a processing strategythat rst strips afxes from complex words and then searches for a storedstem representation Many subsequent studies using a variety ofexperimental techniques have examined the extent to which morpholo-gical information is used to decompose complex words in lexicalprocessing (see Feldman 1995) These studies have produced conictingresults leading to disagreement about the precise nature and locus ofmorphological decomposition For example Andrews (1986) argued thatcompound words are decomposed in lexical access sufxed words areoptionally decomposed in access and all types of complex words arerepresented as whole forms Alternatively Stanners Neiser Hernon andHall (1979) claimed that derived forms are processed as wholes butinected forms are decomposed Another account proposes that prexedwords are processed as whole forms but that sufxed words aredecomposed (Cole Beauvillain amp Segui 1989) Yet another approachadvocates two types of lexical access proceduresmdashwhole word anddecompositionalmdashwhich are activated in parallel (eg Burani Salmasoamp Caramazza 1984 Laudanna Badecker amp Caramazza 1989 1992Laudanna Cermele amp Caramazza 1997) While there has been noconsensus as to exactly when and where morphological processing occursthe majority of researchers agree that some form of discrete all-or-nonedecomposition based on morphological structure operates somewherewithin the lexical system

452 PLAUT AND GONNERMAN

The connectionist approach by contrast holds that decomposition is notan all-or-none phenomenon and that behavioural effects should be gradedreecting the degree of convergence among semantic phonological andorthographic codes One source of evidence for continuous gradedsimilarity structure in the semantic relatedness of morphologically complexwords and their stems comes from subjectsrsquo similarity judgements(Gonnermann 1999 Gonnerman et al 2000a) Gonnerman andcolleagues found that subjects produced highly consistent ratings that fellon a continuum judging some word pairs to be unrelated (eg CORNERndashCORN) some moderately related (eg DRESSERndashDRESS) and some highlyrelated (eg TEACHERndashTEACH) With regard to formal similarity Rueckl etal (1997) present ndings from both masked and standard fragmentcompletion tasks in which priming effects for morphologically related wordpairs varied in magnitude as a function of orthographic similarity

Further empirical support for the connectionist perspective comes froma series of cross-modal priming studies in which subjects made lexicaldecisions to written targets preceded by auditorily presented primes(Gonnerman 1999 Gonnerman et al 2000a see also Marslen-Wilson etal 1994) Gonnerman and colleagues compared the magnitude ofpriming as a function of the degree of semantic transparency betweenprime and target (eg high BREAKAGEndashBREAK intermediate SHORTAGEndashSHORT low MESSAGEndashMESS) Crucially all of the items were phonologi-cally transparent such that the stems were contained unchanged in thecomplex words Gonnerman and colleagues found that prime-target pairswith intermediate degrees of transparency also produced intermediatedegrees of priming Thus highly semantically-related word pairs (egBAKERndashBAKE) primed twice as much (40 vs 19 ms) as moderately relatedpairs (eg DRESSERndashDRESS) and unrelated pairs (eg CORNERndashCORN)showed no priming effects Strikingly similar results were also obtainedfor pairs of prexed words and their stems highly related pairs (egPREHEATndashHEAT) primed twice as much (42 vs 20 ms) as moderatelyrelated pairs (eg MIDSTREAMndashSTREAM) whereas unrelated pairs (egREHEARSEndashHEARSE) showed no priming Again as with the sufxed-stempairs phonological relatedness was held constant A third experimentshowed that these graded effects hold for derived-derived word pairsHighly related pairs (eg SAINTLYndashSAINTHOOD) produced signicantfacilitation (34 ms) whereas less related pairs (eg OBSERVATIONndashOBSERVANT) produced only moderate facilitatory effects (14 ms) thatfailed to reach signicance

Gonnerman et al (2000a) also found evidence for graded effects ofphonological relatedness among word pairs that were equated on measuresof semantic similarity Thus priming was greater among phonologicallyrelated pairs involving a consonant change (eg DELETIONndashDELETE 65 ms)

A CONNECTIONIST APPROACH TO MORPHOLOGY 453

than among pairs involving a vowel change (eg VAN ITYndashVAIN 48 ms)which in turn was greater than priming among pairs involving both voweland consonant changes (eg INTRODUCTIONndashINTRODUCE 35 ms)

In two nal experiments Gonnerman et al (2000a) found similarpriming effects among word pairs that are both semantically andphonologically similar but morphologically unrelated In one experimentsignicant priming (24 ms) was demonstrated for lsquopseudo-sufxedrsquo wordsthat do not share morphological stems (eg TRIVIALndashTRIndash E) but cruciallyonly if the words overlapped sufciently in both meaning and sound Datafrom a second experiment indicated priming (35 ms) for words taken fromsemantic and phonological clusters where there are no morphologicalrelationships (eg SNARLndashSN EER) Again priming occurred only if therewas sufcient overlap in both meaning and sound SW ITCHndashSW IM pairs didnot prime These ndings conrm that priming effects similar to thosefound for morphologically related items hold for word pairs that arecorrelated in meaning and sound but have no morphological relationships(see also Rueckl amp Dror 1994 for a related demonstration that subjectslearn an artical vocabulary better if formal and semantic similarity arecorrelated)

Thus Gonnerman et al (2000a) provided clear evidence for gradedeffects of similarity in semantics phonology and orthography on cross-modal priming Furthermore the graded semantic effects were shown togeneralise across morphological types holding for morphologically relatedsufxed-stem prexed-stem and sufxed-sufxed pairs The effects ofgraded semantic and phonological similarity were demonstrated even formorphologically unrelated word pairs Taken together these resultssupport a distributed connectionist approach where morphology char-acterises the regularities in the learned mapping between semantics andphonology

There are of course ndings that seem to challenge the notion thatmorphological effects can be reduced entirely to the joint inuence ofsemantic and formal similarity For example with regard to formalsimilarity Murrell and Morton (1974) found priming among formallyrelated word pairs only if they were morphologically related (eg CARSndashCAR but not CARDndashCAR) and concluded that priming is determined bymorphological structure rather than by formal relatedness per se Withregard to semantics Kempley and Morton (1982) found that regularlyinected word pairs (eg REndash ECTEDndashREndash ECTING) produced signicantfacilitation whereas irregularly inected pairs (eg HELDndashHOLDING) didnot They argued on the basis of these results that semantic properties ofwords do not inuence processing whereas morphology does Manyadditional studies have come to a similar conclusion that morphologicalstructure and not semantic or formal properties underlies lexical

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 8: Are non-semantic morphological effects incompatible with a ...

452 PLAUT AND GONNERMAN

The connectionist approach by contrast holds that decomposition is notan all-or-none phenomenon and that behavioural effects should be gradedreecting the degree of convergence among semantic phonological andorthographic codes One source of evidence for continuous gradedsimilarity structure in the semantic relatedness of morphologically complexwords and their stems comes from subjectsrsquo similarity judgements(Gonnermann 1999 Gonnerman et al 2000a) Gonnerman andcolleagues found that subjects produced highly consistent ratings that fellon a continuum judging some word pairs to be unrelated (eg CORNERndashCORN) some moderately related (eg DRESSERndashDRESS) and some highlyrelated (eg TEACHERndashTEACH) With regard to formal similarity Rueckl etal (1997) present ndings from both masked and standard fragmentcompletion tasks in which priming effects for morphologically related wordpairs varied in magnitude as a function of orthographic similarity

Further empirical support for the connectionist perspective comes froma series of cross-modal priming studies in which subjects made lexicaldecisions to written targets preceded by auditorily presented primes(Gonnerman 1999 Gonnerman et al 2000a see also Marslen-Wilson etal 1994) Gonnerman and colleagues compared the magnitude ofpriming as a function of the degree of semantic transparency betweenprime and target (eg high BREAKAGEndashBREAK intermediate SHORTAGEndashSHORT low MESSAGEndashMESS) Crucially all of the items were phonologi-cally transparent such that the stems were contained unchanged in thecomplex words Gonnerman and colleagues found that prime-target pairswith intermediate degrees of transparency also produced intermediatedegrees of priming Thus highly semantically-related word pairs (egBAKERndashBAKE) primed twice as much (40 vs 19 ms) as moderately relatedpairs (eg DRESSERndashDRESS) and unrelated pairs (eg CORNERndashCORN)showed no priming effects Strikingly similar results were also obtainedfor pairs of prexed words and their stems highly related pairs (egPREHEATndashHEAT) primed twice as much (42 vs 20 ms) as moderatelyrelated pairs (eg MIDSTREAMndashSTREAM) whereas unrelated pairs (egREHEARSEndashHEARSE) showed no priming Again as with the sufxed-stempairs phonological relatedness was held constant A third experimentshowed that these graded effects hold for derived-derived word pairsHighly related pairs (eg SAINTLYndashSAINTHOOD) produced signicantfacilitation (34 ms) whereas less related pairs (eg OBSERVATIONndashOBSERVANT) produced only moderate facilitatory effects (14 ms) thatfailed to reach signicance

Gonnerman et al (2000a) also found evidence for graded effects ofphonological relatedness among word pairs that were equated on measuresof semantic similarity Thus priming was greater among phonologicallyrelated pairs involving a consonant change (eg DELETIONndashDELETE 65 ms)

A CONNECTIONIST APPROACH TO MORPHOLOGY 453

than among pairs involving a vowel change (eg VAN ITYndashVAIN 48 ms)which in turn was greater than priming among pairs involving both voweland consonant changes (eg INTRODUCTIONndashINTRODUCE 35 ms)

In two nal experiments Gonnerman et al (2000a) found similarpriming effects among word pairs that are both semantically andphonologically similar but morphologically unrelated In one experimentsignicant priming (24 ms) was demonstrated for lsquopseudo-sufxedrsquo wordsthat do not share morphological stems (eg TRIVIALndashTRIndash E) but cruciallyonly if the words overlapped sufciently in both meaning and sound Datafrom a second experiment indicated priming (35 ms) for words taken fromsemantic and phonological clusters where there are no morphologicalrelationships (eg SNARLndashSN EER) Again priming occurred only if therewas sufcient overlap in both meaning and sound SW ITCHndashSW IM pairs didnot prime These ndings conrm that priming effects similar to thosefound for morphologically related items hold for word pairs that arecorrelated in meaning and sound but have no morphological relationships(see also Rueckl amp Dror 1994 for a related demonstration that subjectslearn an artical vocabulary better if formal and semantic similarity arecorrelated)

Thus Gonnerman et al (2000a) provided clear evidence for gradedeffects of similarity in semantics phonology and orthography on cross-modal priming Furthermore the graded semantic effects were shown togeneralise across morphological types holding for morphologically relatedsufxed-stem prexed-stem and sufxed-sufxed pairs The effects ofgraded semantic and phonological similarity were demonstrated even formorphologically unrelated word pairs Taken together these resultssupport a distributed connectionist approach where morphology char-acterises the regularities in the learned mapping between semantics andphonology

There are of course ndings that seem to challenge the notion thatmorphological effects can be reduced entirely to the joint inuence ofsemantic and formal similarity For example with regard to formalsimilarity Murrell and Morton (1974) found priming among formallyrelated word pairs only if they were morphologically related (eg CARSndashCAR but not CARDndashCAR) and concluded that priming is determined bymorphological structure rather than by formal relatedness per se Withregard to semantics Kempley and Morton (1982) found that regularlyinected word pairs (eg REndash ECTEDndashREndash ECTING) produced signicantfacilitation whereas irregularly inected pairs (eg HELDndashHOLDING) didnot They argued on the basis of these results that semantic properties ofwords do not inuence processing whereas morphology does Manyadditional studies have come to a similar conclusion that morphologicalstructure and not semantic or formal properties underlies lexical

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 9: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 453

than among pairs involving a vowel change (eg VAN ITYndashVAIN 48 ms)which in turn was greater than priming among pairs involving both voweland consonant changes (eg INTRODUCTIONndashINTRODUCE 35 ms)

In two nal experiments Gonnerman et al (2000a) found similarpriming effects among word pairs that are both semantically andphonologically similar but morphologically unrelated In one experimentsignicant priming (24 ms) was demonstrated for lsquopseudo-sufxedrsquo wordsthat do not share morphological stems (eg TRIVIALndashTRIndash E) but cruciallyonly if the words overlapped sufciently in both meaning and sound Datafrom a second experiment indicated priming (35 ms) for words taken fromsemantic and phonological clusters where there are no morphologicalrelationships (eg SNARLndashSN EER) Again priming occurred only if therewas sufcient overlap in both meaning and sound SW ITCHndashSW IM pairs didnot prime These ndings conrm that priming effects similar to thosefound for morphologically related items hold for word pairs that arecorrelated in meaning and sound but have no morphological relationships(see also Rueckl amp Dror 1994 for a related demonstration that subjectslearn an artical vocabulary better if formal and semantic similarity arecorrelated)

Thus Gonnerman et al (2000a) provided clear evidence for gradedeffects of similarity in semantics phonology and orthography on cross-modal priming Furthermore the graded semantic effects were shown togeneralise across morphological types holding for morphologically relatedsufxed-stem prexed-stem and sufxed-sufxed pairs The effects ofgraded semantic and phonological similarity were demonstrated even formorphologically unrelated word pairs Taken together these resultssupport a distributed connectionist approach where morphology char-acterises the regularities in the learned mapping between semantics andphonology

There are of course ndings that seem to challenge the notion thatmorphological effects can be reduced entirely to the joint inuence ofsemantic and formal similarity For example with regard to formalsimilarity Murrell and Morton (1974) found priming among formallyrelated word pairs only if they were morphologically related (eg CARSndashCAR but not CARDndashCAR) and concluded that priming is determined bymorphological structure rather than by formal relatedness per se Withregard to semantics Kempley and Morton (1982) found that regularlyinected word pairs (eg REndash ECTEDndashREndash ECTING) produced signicantfacilitation whereas irregularly inected pairs (eg HELDndashHOLDING) didnot They argued on the basis of these results that semantic properties ofwords do not inuence processing whereas morphology does Manyadditional studies have come to a similar conclusion that morphologicalstructure and not semantic or formal properties underlies lexical

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 10: Are non-semantic morphological effects incompatible with a ...

454 PLAUT AND GONNERMAN

processing (eg Fowler Napps amp Feldman 1985 Grainger Cole ampSegui 1991 Marslen-Wilson et al 1994 Napps 1989 Napps amp Fowler1987 Stolz amp Besner 1998)

It is important to note however that none of these studies examined ormanipulated semantic and formal similarity jointly For example Nappsand colleagues failed to nd effects of semantic factors on morphologicalpriming in one study (Fowler et al 1985) and failed to nd effects ofphonological factors in another (Napps amp Fowler 1987) and concludedthat lsquolsquomorphemic priming is not the result of the convergence of semanticorthographic and phonological relationships but rather that morphemicrelationships are represented explicitly in the lexiconrsquorsquo (Napps 1989p 729) Similarly Marslen-Wilson et al (1994) failed to nd an effect ofdegree of phonological change between stem and derived forms but they(unlike Gonnerman et al 2000a) did not equate semantic similarity acrossconditions In a subsequent experiment they found evidence that primingwas inuenced by morphological typemdashsufxed-stem word pairs exhibitedpriming but sufxed-sufxed pairs did notmdasheven though the conditionswere matched on semantic similarity Gonnerman et al (2000b) showedhowever that the word pairs used by Marslen-Wilson et al (1994) in thesufxed-sufxed condition were systematically less phonologically similarthan the pairs in the sufxed-stem condition in fact sufxed-sufxed pairsdo yield priming if they are sufciently related (Gonnerman et al 2000a)Gonnerman and colleagues also demonstrated that a distributed connec-tionist model trained on the experimental stimuli closely replicated theempirically observed pattern of priming

More recently Stolz and Besner (1998) explicitly question the ability ofthe connectionist approach to account for their morphological effectsThey conducted a series of experiments where subjects made lexicaldecisions after performing a letter search task Two experiments showedpriming for morphologically related items (eg MARKEDndashMARK) comparedto either an unrelated or an orthographic baseline (eg MARKETndashMARK) Ina third experiment morphologically related items produced signicantpriming whereas items that were only semantically related (eg GOLDndashSILVER) did not The authors also collected semantic relatedness ratingsand demonstrated that these ratings could not account for the differencesin priming between morphologically and semantically related items Basedon the combined results from their separate experiments Stolz and Besnerargue that morphological effects are independent of either orthographic orsemantic similarity and thus challenge a connectionist approach Recallhowever that form and meaning interact on our connectionist account sowe would in fact predict greater priming for items that are similar in bothmeaning and sound (eg morphologically related pairs) than for thoserelated only on one of the two dimensions Indeed the Stolz and Besner

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 11: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 455

results far from presenting a challenge are entirely consistent with adistributed connectionist approach

Overall the evidence for morphological effects in English that are notreducible to semantic or formal similarity is we believe questionable atbest More convincing evidence of pure morphological effects comes fromexperiments in languages such as Hebrew with much richer morphologicalstructure than English As mentioned in the Introduction virtually everyword in Hebrew is morphologically complex being formed by interdigitat-ing a mostly vowel-based word pattern into a three-consonant root (seeBentin amp Frost 1995 Berman 1978) Words derived from the same rootexhibit a range of semantic relatedness Bentin and Feldman (1990) tookadvantage of this fact to compare priming of a target word (eg MIGDAL

[tower] root GDL) by morphologically related primes that either weresemantically related to the target (eg GADOL [big]) or were not (egGYDUL [tumor]) as well as by semantically related but morphologicallyunrelated primes (eg TZERIAH [turret]) For long-lag repetition primingthey found no reliable non-morphological semantic priming and equivalentlevels of morphological priming regardless of whether the primes weresemantically related or not (28 and 24 ms respectively) Analogous resultshave also been obtained using brief masked priming (Frost et al 1997)Note that non-semantic morphological priming in Hebrew is not restrictedto within-modality (eg visual-visual) presentation as Frost et al (in press-b) also found reliable priming for morphologically related but semanticallyopaque items under cross-modal presentation (26 ms) In this case themagnitude of priming did increase if the items were also semanticallyrelated (42 ms) although this is not surprising as the paradigm also yieldssemantic priming among morphologically unrelated items (Gonnerman etal 2000a Marslen-Wilson et al 1994)

There are analogous items in English that are morphologically relatedon a linguistic analysis but have little if any semantic similarity Forexample Aronoff (1976) has argued that -MIT operates like a morpheme inthat it undergoes systematic morphological transformation across a broadclass of words (eg TRANSMITndashTRAN SMISSION REMITndashREMISSION COMMITndashCOMMISSION) even though the semantic contribution of -MIT in these wordsis largely opaque Note however that such items do not seem to producereliable behavioural priming effects under either cross-modal presentation(Marslen-Wilson et al 1994) or visual-visual presentation with 250 msprime duration (Feldman amp Soltano 1999) Thus English appears tocontrast with Hebrew in this respect

An even more direct challenge to the connectionist approach comesfrom recent demonstrations that lsquostructuralrsquo manipulations of surface formthat retain surface and semantic similarity can nonetheless have aprofound impact on morphological priming (Frost et al in press-a) Frost

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 12: Are non-semantic morphological effects incompatible with a ...

456 PLAUT AND GONNERMAN

and colleagues found that among verbs prime-target pairs in Hebrew withdifferent roots but a common word pattern produce reliable priming ifroots have three consonants but that this priming is eliminated when theprime is a lsquoweakrsquo form containing only two consonants The word-patternpriming is reinstated however if the missing consonant is replaced withanother random consonant to form a lsquopseudo-rootrsquo (resulting in a nonwordprime) The authors were careful to ensure that the manipulation of theconsonantal roots did not alter the orthographic overlap or introducesemantic similarity among the resulting primes and targets Thus itappears as if the existence of priming in this context is governed not bysemantic or surface similarity but by the presence of a structurally intactmorphologically dened three-consonant root

Taken at face value the evidence of pure morphological effects inHebrew would seem to conict with the connectionist proposal thatmorphology derives from learned relationships among the surface forms ofwords and their meanings Indeed Frost et al (in press-a) claim that theirresults are incompatible with distributed connectionist systems of whichlsquoall that can be said is that a level of hidden units picks up the correlationsbetween phonology and semantics or orthography and semantics andthese underlie the morphological effectsrsquo To more fully understand whatthe connectionist approach actually predicts in this context however wemust consider in detail some of the computational principles that underliethe approach It will turn out that the implications of these principles formorphological processing are far more subtle than is often assumed andthat the degree to which morphologically related items without semanticsimilarity should exhibit priming on a connectionist account depends onthe degree of morphological structure of the entire language These claimsare supported by simulations described in this article

PRINCIPLES UNDERLYING THE CONNECTIONISTAPPROACH

The connectionist approach instantiates a number of computationalprinciples that are relevant to morphological processing (see Figure 2)We discuss ve central ones in some detail because they are important forunderstanding the conditions under which the approach predicts morpho-logical effects in the absence of semantic andor phonological similarity(for additional background on principles of connectionist modelling seeChauvin amp Rumelhart 1995 Hertz Krogh amp Palmer 1991 McClelland etal 1986 Rumelhart Hinton amp Williams 1986a Smolensky Mozer ampRumelhart 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 13: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 457

Principle 1 Distributed representations

Although many inuential connectionist models employ localist represen-tations in which individual units correspond to familiar entities like wordsand concepts (see Page in press for discussion) a richer set of theoreticalprinciples is available through the use of distributed representations(Hinton McClelland amp Rumelhart 1986) In a distributed representationfamiliar entities are encoded by alternative patterns of activity over thesame set of units where each unit participates in representing manyentities For any pair of entities their similarity in a given domain isreected by the degree of overlap (ie number of shared active units) intheir representations over the units for that domain The full set ofpairwise similarities among the representations for a given domainconstitutes the similarity structure of that domain Note that this similaritystructure is intrinsic to a set of distributed representations whereas withlocalist representations it must be imposed externally by appropriateassociative weights between units

Figure 2 Principles of connectionist modelling In the diagrams the circles represent unitsand the ovals represent groups of units The rectangles represent similarity spaces in which thepoints correspond to different entities the distance between points indicates the relativesimilarity (overlap) of the corresponding representations Arrows indicate mappings betweenrepresentations

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 14: Are non-semantic morphological effects incompatible with a ...

458 PLAUT AND GONNERMAN

Principle 2 Systematicity

Whereas similarity structure concerns the relationships among representa-tions within a domain systematicity concerns the relationship between thesimilarity structures of two domains In order to perform a task a networkmust map from the representations in one domain to the representations inanother For example oral reading involves mapping the orthographicrepresentations of words to their phonological representations whereasspoken word comprehension involves mapping phonological representa-tions to semantic representations In general the ease of performing amapping will depend on its systematicitymdashthat is the extent to whichpatterns that are similar in one domain map to patterns that are similar inthe other domain Thus in English oral reading is highly systematicbecause words that are spelled similarly (eg CAT and CAP) tend to bepronounced similarly (see the left side of Figure 2b) By contrast spokencomprehension of monomorphemic words is highly unsystematic becausephonological similarity is unrelated to semantic similarity (see the rightside of Figure 2b)

Systematic mappings are easier for connectionist networks to performbecause networks have an inherent tendency to produce similar responsesto similar inputs This is because unit activations are determined by rstcomputing a linear weighted sum of inputs from other units and thenapplying a smooth non-linear (typically sigmoidal) function to thissummed input A similar pattern of activity over the sending units willgenerally produce a similar summed input to the receiving unit and hencea similar activation value The tendency to give a similar output to similarinputs is violated only when the differences between the sending patternshappen to be over units with very large connection weights to the receivingunits It is possible for learning to develop large weights when it isnecessary to perform a task but it takes a long time to do so (Hinton ampSejnowski 1986)

Principle 3 Learned internal representations

Mappings that are highly systematic can often be carried out by a networkwith only direct connections from input units to output units (see egZorzi Houghton amp Butterworth 1998 for such a model in the domain oforal reading) Such two-layer networks are however severely restricted inthe mappings they can perform (Minsky amp Papert 1969) In order to carryout more complicated mappings a network must have the ability to re-represent the input patterns such that they can can be mapped effectivelyto the output patterns More specically the network requires additionalhidden units between the inputs and outputs and it must developrepresentations over these units such that the similarity structure among

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 15: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 459

hidden representations is sufciently similar to that of both the input andoutput representations Each separate transformation (ie input-to-hiddenand hidden-to-output) is then more systematic than the full input-to-output mapping

In effect one can think of the hidden representations as lsquolsquosplitting thedifferencersquorsquo between the structure of the input and the structure of theoutput (this is depicted in Figure 2c by the fact that the arrows map instraight lines through the intermediate similarity space) For a relativelysystematic mapping the hidden representations will preserve much of thecommon similarity structure between the inputs and outputs deviatingfrom this only for the more idiosyncratic aspects of the mapping Bycontrast for an unsystematic mapping in which the similarity structures ofthe inputs and outputs are very different the structure of the hiddenrepresentations may need to be rather different from each of them

The tendency for hidden representations to adopt intermediatesimilarity structure was illustrated by Plaut (1991) for an attractor networkthat mapped from orthography to semantics via a single hidden layer Thistask is a particularly useful one to investigate as orthographic similarity islargely unrelated to semantic similarity Figure 3 shows over the course ofeight iterations of settling the degree to which the activations over thehidden and semantic layers exhibit semantic structure versus orthographicstructure This was measured by computing the correlation between thepairwise similarities of representations at each layer with the pairwisesimilarities among the orthographic or semantic representations respec-tively The gure shows that the similarity structure of the networkrsquoshidden representations (open symbols) falls between those of orthographyand semantics (closed symbols) Specically over the last three iterationsthe semantic activations are perfectly correlated with semantic similaritystructure because the network is successful at generating the correctsemantic patterns The low degree of orthographic structure of these nalsemantic activations indicates that semantic structure is largely uncorre-lated with orthographic structure Most interesting however is that thenal hidden activations are only moderately (and about equally) correlatedwith both orthographic structure and with semantic structure Thus inlearning to map orthography to semantics the network has developedhidden representations that compromise between orthographic andsemantic similarity structure

Principle 4 Componentiality

A form of systematicity that is particularly relevant to morphology iscomponentialitymdashthe degree to which parts of the input can be mappedindependently from the rest of the input According to standard theories of

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 16: Are non-semantic morphological effects incompatible with a ...

460 PLAUT AND GONNERMAN

morphology morphemes are discrete units that either are or are notcontained in a given word From a connectionist perspective by contrastthe notion of lsquomorphemersquo is an inherently graded concept because theextent to which a particular part of the orthographic or phonological inputbehaves independently of the rest of the input is always a matter of degree(Bybee 1985) Also note that the relevant parts of the input need not becontiguous as in prexes and sufxes in concatenative systems likeEnglish Even non-contiguous subsets of the input such as roots and wordpatterns in Hebrew can function morphologically if they behave system-atically with respect to meaning

A network comes to exhibit degrees of componentiality in its behaviourbecause on the basis of exposure to examples of inputs and outputs from a

Figure 3 Changes in the similarity structure among hidden representations and amongsemantic representations during settling in a network that mapped orthographic representa-tions to semantic representations (Plaut 1991 Plaut amp Shallice 1993) lsquoSemantic structurersquoand lsquoorthographic structurersquo refer to the pairwise similarities among specified semantic ororthographic representations respectively (Adapted from Plaut 1991)

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 17: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 461

task it must determine not only what aspects of each input are important forgenerating the correct output but also what aspects are uninformative andshould be ignored This knowledge can then apply across large classes ofitems only within small subclasses or evenbe restricted to individual itemsIn this way the network learns to map parts of the input to parts of theoutput in a way that is as independent as possible from how the remainingparts of the input are mapped This provides a type of combinatorialgeneralisation by allowing novel recombinations of familiar parts to beprocessed effectively In short a network can develop componentialrepresentations that handle the systematic aspects of the task and thatgeneralise to novel forms while simultaneously developing less componen-tial representations for handling the idiosyncratic aspects of the task

The ability of connectionist networks to learn internal representationswith varying degrees of componentiality has been illustrated by Plaut et al(1996) in the domain of English word reading After training an attractornetwork to map from orthography to phonology for about 3000monosyllabic words Plaut and colleagues determined the minimum levelof activation over each orthographic input cluster (onset vowel coda)required to activate each phonological output cluster correctly Theycontrasted regular consistent words that obey the standard spelling-soundcorrespondences of English (ie GAVE MINT analogous to morphologicallytransparent words) with exception words that violated these correspon-dences (ie HAVE PINT analogous to opaque words) They also consideredthe intermediate case of so-called ambiguous words which contain spellingpatterns with multiple common pronunciations (eg ndashOWN in DOW N TOW NBROW N etc vs KN OWN SHOW N GROW N etc)

The results are shown in Figure 4 The network developed highlycomponential representations for regular consistent words in that eachphonological cluster depended only on the corresponding orthographiccluster (ie onset ) onset vowel ) vowel coda ) coda) Thiscomponentiality supports effective generalisation to pronounceable non-words (eg MAVE RIN T) because each cluster in the input is mapped more-or-less independently of the others Onsets and codas in the other wordclasses are also treated componentially because they typically obey thestandard spelling-sound correspondences By contrast the vowel cluster ofexception wordsmdashwhich is the primary locus of exceptionality in Englishmdashis treated non-componentially Correct activation of the vowel in thephonological output was sensitive to activation across the entire ortho-graphic input Moreover the representations of the ambiguous words withintermediate levels of spelling-sound consistency showed intermediatedegrees of componentiality Thus the network solved the task by learninginternal representations with the appropriate degree of componentialitygiven the specic nature of the statistical structure of the task

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 18: Are non-semantic morphological effects incompatible with a ...

462 PLAUT AND GONNERMAN

Figure 4 Componential representations in the word reading model of Plaut et al (1996Simulation 3) for Taraban and McClellandrsquos (1987) regular consistent words (top) andexception words (bottom) and for orthographically matched lsquoambiguousrsquo words with anintermediate degree of spellingndashsound consistency (middle) The height of each bar indicatesthe minimum activity level of the indicated orthographic cluster to correctly activate aparticular phonological cluster Words lacking either an onset or coda consonant cluster inorthography were excluded from the analysis (Adapted from Plaut et al 1996)

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 19: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 463

By the same token a network trained on inputs with gradedmorphological structure would be expected to develop representationswith the corresponding degrees of componentiality (see eg Rueckl ampRaveh 1999) That is items with a relatively high degree of morphologicaltransparency (eg TEACHER) will be represented and processed componen-tially (TEACH + -ER) whereas relatively opaque items (eg CORNER) will betreated more holistically Critically the connectionist approach offers acontinuum of compositionality such that items with an intermediatedegree of transparency (eg DRESSER) analogous to the ambiguous wordswill participate in the more general componential structure to some degreebut will also deviate from this structure somewhat

Principle 5 One system

The nal principle is perhaps the one most directly relevant to the currentsimulation work It concerns the simple fact that all items within a task areprocessed by the same system so that the knowledge of how to handle allaspects of the task must be superimposed on the same set of weights Thusalthough a single network can handle both the systematic regulartransparent aspects of the task and the idiosyncratic irregular opaqueaspects (and everything in between) these aspects are not treatedindependently of one another Rather every item is processed in thecontext of knowledge about every other item and the general organisationof the hidden representations is determined by the overall nature of thetask If a task contains a large proportion of opaque items it will bedifcult for a network to discover and take advantage of the systematicitiesunderlying the few transparent items On the other hand if a task is largelysystematic the hidden representations will be dominated by this structuremaking it more difcult to learn the opaque items indeed the opaqueitems will to some extent be carried along by this structure and theirrepresentations will have to compensate

This property may have important implications for understanding thegeneral pattern of empirical results reviewed in the previous sectionsuggesting that morphological effects that are independent of semantic andsurface similarity are found only in languages with rich morphologicalstructure (eg Hebrew) but not in languages with relatively impoverishedmorphology (eg English) On the connectionist account a given set ofopaque items will be represented and processed differently depending onthe degree of systematicity of the rest of the task In an unsystematic taskthe opaque items will be free to behave idiosyncratically because the otheritems do not combine to exert a coherent inuence In an otherwisesystematic task the same opaque items will be subject to the stronginuence of the overriding transparent structure of the task and thus will

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 20: Are non-semantic morphological effects incompatible with a ...

464 PLAUT AND GONNERMAN

necessarily be represented in a way that coexists effectively with thisstructure This is depicted in Figure 2e by the fact that mapping for thelone opaque item is distorted to be more consistent with the mappings fortransparent items than it would have been in isolation (straight linethrough open circle) The inuence of overall task structure on the way inwhich opaque items are represented and processed within connectionistnetworks forms the basis of our account of empirical ndings withinmorphologically rich languages like Hebrew of morphological effects thatare not reducible to semantic andor phonological similarity

SIMULATION 1

The preceding discussion of the computational principles underlying adistributed connectionist approach to morphology suggests that althoughmorphology ultimately derives from learned systematic relationshipsbetween word forms and meanings the representational structure thatemerges from this learning may at least under some circumstances extendto the processing of morphologically simple opaque forms In this way theapproach may be consistent not only with graded inuences of semanticand formal similarity on morphological priming (eg Gonnerman et al2000a) but also with ndings of effects of morphology that areindependent of semantic and formal similarity (eg Bentin amp Feldman1990 Frost et al 1997 in press-a) Although it may seem at rst glance asif the approach is underconstrained if it can account for such inconsistentpatterns of performance it actually provides a possible explanation for thediversity of empirical observations In particular the connectionistapproach may provide insight into why the specic pattern of morpholo-gical priming observed depends on the overall degree of morphologicalrichness of the language

To further substantiate this account we carried out a simulation inwhich a network was trained on an abstract version of the task ofcomprehending morphologically complex written words in either of twolanguages2 One of the languages had a rich degree of morphologicalstructure analogous to Hebrew whereas the other had far lessmorphological structure analogous to English For present purposes themorphological lsquorichnessrsquo of a language was operationalised in terms ofthe frequency and extent to which the structure of words within the

2 Although our general framework for lexical processing includes interactions of bothorthography and phonology with semantics (see Figure 1) we included only orthography inthe current simulation because the current work is directed at understanding ndings fromlanguages such as English and Hebrew in which orthographic and phonological structure arehighly correlated Thus for present purposes the inclusion of phonology would have beenlargely redundant with orthography

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 21: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 465

languagemdashboth in terms of their surface forms and their meaningsmdashcouldbe systematically decomposed into components that participate in theconstruction of many other words More specically each languageemployed a small number of lsquoafxesrsquo languages differed both in theproportion of words containing an afx and in the degree to which themeanings of such words could be predicted from the lsquotransparentrsquomeanings of their stems and afxes The instantiation of morphologicalstructure within the simulation was however very abstract and notintended to approximate the actual morphology of any real language inany detail Rather it was designed to capture the most generalcharacteristics of morphological structure and to permit tightly controlledmanipulations of this structure in a way analogous to variations ofmorphological richness across languages The goal was to demonstratethat on a connectionist account the occurrence of non-semanticmorphological effects depends on the degree of morphological structureof the entire language We consider the implications of the limitations inour approximation of morphological structure in the General Discussion

Method

Network architecture The architecture of the network is shown inFigure 5 Thirty orthographic units were fully connected to 300 hiddenunits which in turn were fully connected to 50 semantic units The hiddenand semantic units also had trainable biases that are equivalent toconnections from a unit whose activation is always equal to 10 Includingthe biases the network had a total of 24350 connections

The activation aj of each unit j was computed using the standard sigmoidfunction s( )

Figure 5 The architecture of the network

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 22: Are non-semantic morphological effects incompatible with a ...

466 PLAUT AND GONNERMAN

aj = si

aiwij =1

1 + exp plusmn i aiwij(1)

where wij are the weights on connections from sending units iEach weight was initialised to a random number drawn from a uniform

distribution between 01 except for the biases These were initialised tosplusmn1 01( ) = plusmn2197 for hidden units and splusmn1 02( ) = plusmn1368 for semanticunits Using relatively large negative initial biases improves stability at theonset of learning

Representations The orthographic representations of all words con-tained two syllables chosen from 100 possible rst syllables and 100possible second syllables Each syllable was represented in terms of arandom binary pattern over 15 units with approximately one-third of theunits active for any syllable Thus no attempt was made to introduce anyparticular similarity structure among the syllables

The abstract morphology was modelled after a system with stems andafxes We will refer to the rst syllables as stems for ease of expositionalthough they ranged in transparency from those behaving like fullyindependent morphemes to those with no independent semantic statuswhatsoever Ten of the 100 possible second syllables were selected atrandom and designated as afxes these syllables behaved in a systematicfashion with regard to their occurrence in the language and theirimplications for meaning In particular the transparency of a stem wasreected in the frequency with which it combined with the afxes (asopposed to other second syllables) and in the degree to which themeanings of these stem-afx combinations preserved the canonicalmeanings of the stem and afx

The construction of semantic representations was designed to permit agraded degree of transparency between the orthographic forms of wordsand their meanings This was done by rst assigning each of the 200possible syllables a canonical meaning over 50 semantic features contain-ing 10 randomly selected semantic features for stems and 5 for afxes andother second syllables For any pair of syllables a transparent meaning wasconstructed by including any semantic feature that was in the meaning ofeither syllable (ie bitwise OR) Note that the resulting pattern has a highdegree of overlap with each of the component patterns This transparentmeaning was then distorted to create an actual word meaning by randomlyturning off a specied proportion of its active features and randomlyturning on the same proportion of features elsewhere

Each of the 100 stems was associated with one of four levels ofdistortion resulting in four word classes (a) transparent words retained all

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 23: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 467

of the features of their transparent meaning (b) intermediate wordsretained two-thirds of the features (c) distant words retained one-third ofthe features and (d) opaque words retained none of the features of theirtransparent meaning Thus words sharing a stem had varying degrees ofsemantic similarity as a function of the levels of distortion used in creatingtheir meanings

Using these procedures for generating representations two languageswere created containing 1200 words each In the morphologically richlanguage the rst 60 stems forming 720 words were all transparent in theimpoverished language they were all opaque This contrast formed themain experimental manipulation of the simulation Critically the remain-ing 480 words were identical across the two languages both orthographi-cally and semantically These were formed from 10 transparent stems 10intermediate stems 10 distant stems and 10 opaque stems The simulationwas designed to evaluate the degree of morphological priming among thisshared set of words as a function of the nature of the remaining words ineach of the two languages

Opaque stems differed from the three classes of non-opaque stems (ietransparent intermediate distant) not only in their degree of transparencybut also in their distributional properties with respect to the 10 designatedafxes In particular each opaque stem was combined with 2 randomlychosen afxes and 10 randomly chosen non-afx second syllables to form12 words all of which were assigned opaque meanings The combinationswith afxes are analogous to pseudo-afxed words like CORNER Bycontrast each non-opaque stem was combined with all 10 afxes and with2 randomly chosen non-afxes The afxed items were assigned semanticrepresentations with the specied degree of transparency the non-afxeditems were assigned opaque meanings and are analogous to words likeSPINACH which are not morphologically complex but happen to start withsyllables that can be stems in other contexts

Training procedure Starting with the identical set of initial randomweights the network was trained on either the morphologically rich orimpoverished language For each word presentation the activations oforthographic units were set to the assigned orthographic representation ofthe word and hidden and semantic activations were computed in turnPerformance error was computed in terms of the cross-entropy (Hinton1989) between the semantic activations generated by the network and thecorrect meaning of the word using targets of 08 or greater for presentfeatures and 005 or less for absent features The partial derivative of thiserror with respect to each weight and bias in the network was thencomputed using back-propagation (Rumelhart et al 1986a Werbos 1974)and accumulated over word presentations Weights were updated only

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 24: Are non-semantic morphological effects incompatible with a ...

468 PLAUT AND GONNERMAN

after each sweep through the full 1200 words using a learning rate of00005 momentum of 09 (applied only after the rst 10 sweeps) andweight decay of 00001

The two versions of the network were trained until they were fullyaccurate at activating only the correct semantic features for each word andhad reached equivalent (and extremely low) levels of quantitative errorThis required 300 training sweeps for the morphological rich language and600 sweeps for the impoverished language yielding means of cross-entropyper unit per pattern of 00034 and 00037 respectively

Testing procedure The network was evaluated for its ability toproduce facilitation or inhibition in reaction time (RT) in a paired primingparadigm for each of the four levels of morphological transparency in eachof the two languages Given that the standard computations in afeedforward network do not produce a time-course of processing overthe output units we altered the computation in the network to cascade unitactivations (McClelland 1979) according to the equation

a[t+t]j = t s

i

a[t]i wij + 1 plusmn t( ) a[t]

j (2)

with a time constant t = 001 We then dened RT to be the amount oftime required to reach a stability criterion such that the mean change insemantic activations fell below 001 Note that Equation 2 is equivalent toEquation 1 for t = 10 and reaches stability when a[t+ t]

j = a[t]j which occurs

when a[t]j = s i a

[t]i wij thereby satisfying Equation 1 Thus the cascaded

network asymptotes to the same activations as the standard network(McClelland 1979) The technique of cascading unit activations within afeedforward network has been applied effectively to modelling detailedRT data in other contexts (eg Cohen Dunbar amp McClelland 1990)

On a given trial a prime word was presented to the network by settingorthographic activations to its orthographic representation and initialisinghidden and semantic activations to 01 The network then processed theprime for 10 unit of time (100 unit updates according to Equation 2 witht = 001) The orthography of a target word then replaced that of theprime without altering any other activations in the network and processingcontinued until the semantic activations satised the stability criterion(mean semantic activation change lt 001) in settling to the meaning of thetarget word The time from the onset of the target to when this criterionwas reached was taken as the networkrsquos RT to the target following theprime

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 25: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 469

As the primary goal of the simulation was to evaluate the inuence oflanguage structure on morphological processing all primes and targetswere drawn only from the 480 words that were common across the twolanguages In this way all potential confounds due to differences in surfaceand semantic structure across languages were eliminated Targets consistedof each afxed word generated from the transparent intermediate distantand opaque stems There were 100 targets in each of the rst three classesand 20 in the last class (as these are the pseudo-afxed items)

Many empirical studies employ primes that are individually matched toeach target word To derive a more reliable estimate of priming effectshowever we computed RTs for each target word proceeded by everyprime word having the appropriate relationship to the target wordIdentical primes were of course the targets themselves Morphologicalprimes consisted of all non-identical afxed words with the same stem(and hence the same level of morphological transparency) There werenine morphological primes for each non-opaque target (900 total prime-target pairs) and one for each opaque target (20 prime-target pairs)Control primes consisted of non-afxed opaque words with no syllables incommon with the target There were 90 of these for each opaque target (9non-identical stems 10 non-afxed second syllables 1800 total prime-target pairs) As there were far more non-opaque targets we randomlysampled among possible control primes for these targets to produceapproximately 2000 prime-target pairs for each level of transparencyTable 1 gives examples from English that are analogous to each of theexperimental conditions used in testing the network

It should be pointed out that the identity condition for the network is notdirectly analogous to that for human subjects in that no change in thestimulus occurs between prime and targetmdashit is exactly equivalent tomeasuring the networkrsquos RT to the target and then subtracting the primeduration of 10 By contrast identity conditions in empirical studies employsome physical change in the stimulus such as increasing its size or changingfrom lower to upper case Insofar as these alterations slow RTs to thetarget the networkrsquos RTs will underestimate those of human subjects

TABLE 1English examples of experimental conditions

Priming condition

Transparency Target Identical Morphological Control

Transparent RUNNER RUNNER RUNNING SPINACH

Intermediate DRESSER DRESSER DRESSIN G SPINACH

Distant TENDER TENDER TENDING SPINACH

Opaque CORN ER CORNER CORN ISH SPINACH

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 26: Are non-semantic morphological effects incompatible with a ...

470 PLAUT AND GONNERMAN

Results and discussion

Figure 6 shows the total error produced by the network over the course oftraining on either the morphologically rich or impoverished languagesWith the same architecture and starting from the same initial randomweights the network clearly acquires the morphologically rich languagemore quickly than the impoverished one requiring only half as manysweeps through the 1200-word corpus to achieve the same nal level ofperformance This difference arises directly from the different degrees ofsystematicity across the two languages as discussed under Principle 2earlier It is also broadly consistent with cross-linguistic differences in ratesof morphological acquisition (see eg Andersen 1992)

Each version of the network was then run on the identical set of prime-target pairs as described in the Method section and its RTs weremeasured Table 2 presents the mean RTs of the network to target wordsof each level of systematicity as a function of the type of the precedingprime word and the language on which it was trained The RTs were thenentered into a 2 3 4 ANOVA with items as the random variablelanguage (rich impoverished) and priming condition (identical morpho-

Figure 6 Total crossndashentropy error for the entire training corpus produced by the networkas a function of sweeps through the corpus for either the morphologically rich orimpoverished language

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 27: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 471

logical control) as within-item factors and morphological transparency(transparent intermediate distant opaque) as a between-item factorThere was no need to perform any data trimming as the network operateddeterministically and made no errors

The ANOVA revealed a main effect of language with targets producingfaster RTs in the morphologically rich language compared with theimpoverished language (means of 419 vs 427 respectively F1316 = 3442p lt 001) a main effect of priming condition with identity primes yieldingthe fastest RTs followed by morphological primes and then controlprimes (means of 353 446 469 respectively F2316 = 34526 p lt 001)and a main effect of morphological transparency (means of 391 429 444and 443 for transparent intermediate distant and opaque targetsrespectively F3316 = 2522 p lt 001) However priming conditioninteracted with both language and transparency such that the effect ofpriming condition was greater in the impoverished compared with the richlanguage (F2632 = 1449 p lt 001) and with decreasing morphologicaltransparency (F6632 = 2066 p lt 001)

Given the concerns about the equivalence of the identity condition forthe network compared with that in empirical studies a second 2 2 4ANOVA was carried out excluding this priming condition This ANOVAproduced exactly the same pattern of results as the rst

The effects of priming condition and morphological transparency arestraightforward and follow from the principles underlying the behaviourof connectionist networks For both factors RTs were faster when theprime had greater orthographic and semantic overlap with the targetThe effects of language are more interesting as these reect not the

TABLE 2Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 328 (002) 395 (002) 441 (001) 111 045Intermediate 358 (002) 447 (002) 470 (002) 113 026Distant 369 (002) 467 (002) 482 (002) 113 014Opaque 363 (005) 467 (005) 475 (004) 113 008

Impoverished languageTransparent 328 (002) 407 (002) 447 (002) 117 037Intermediate 360 (003) 462 (003) 477 (002) 118 017Distant 372 (003) 484 (002) 489 (002) 117 005Opaque 371 (006) 491 (005) 489 (005) 118 plusmn002

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 28: Are non-semantic morphological effects incompatible with a ...

472 PLAUT AND GONNERMAN

primes and targets themselvesmdashwhich were held constant across thelanguagesmdashbut the different contexts in which these words were learned(see Principle 5 earlier) RTs were faster in the morphologically richlanguage than in the impoverished language because the overallorganisation of the internal representations within the network is moremorphologically structured in the former case This organisation supportsmore effective processing of primes and targets that at least to somedegree share this morphological structure The effect of primingcondition was stronger in the impoverished language essentially becausethere is more room for improvement in performance on the targets whenbase RTs are slower (see also Plaut 1995 Plaut amp Booth in press Plautet al 1996)

Critically the effect of language context on performance is not restrictedto morphologically structured words Indeed in a separate plannedcomparison restricted only to opaque targets and the morphological andcontrol priming conditions there was a reliable interaction of language andpriming condition (F119 = 7269 p = 0143) such that priming was greaterin the rich language In fact separate t-tests showed that morphologicalpriming (ie RTs following control primes minus RTs followingmorphological primes) was reliable in the rich language (t19 = 3780 p =

001) but not in the impoverished language (t19 = 0536 p = 598) Figure 7shows the magnitudes of morphological priming at each level oftransparency for the two languages

Thus as suggested in the discussion of Principle 5 the way in which a setof semantically opaque pseudo-afxed items (eg CORN ER) is representedand processed by a connectionist network depends on the nature of theentire task on which it is trained If the overall language has relatively littlemorphological structure as in English then opaque items are free todevelop fairly idiosyncratic representations even though their surfaceforms happen to contain segments (eg CORN -ER) that operate asmorphemes in other contexts Hence processing one such item as a primewill have little impact on processing another as target relative to non-afxed control primes By contrast in a language that is highlymorphologically structured the processing of the stem and pseudo-afxin opaque items is more strongly inuenced by how these elements arehandled throughout the rest of the language Their representation inopaque items necessarily has some similarity with their representation inmore semantically transparent items and hence with each other Thus theprocessing of a pseudo-afxed item will be facilitated by processinganother such item with the same stem or afx Note that in this context itisnrsquot strictly appropriate to call such items pseudo-afxed as although theyare semantically opaque they nonetheless participate in the morphologicalstructure of the entire language

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 29: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 473

To demonstrate the effect of training context on the internalrepresentations developed by the network we determined the hiddenrepresentation in each version of the network of each word shared acrossthe two languages We then compared their pairwise similarities measuredin terms of Pearson correlation coefcients as a function of the level oftransparency of the two words and the nature of their orthographicoverlap if any

Overall the representations of all non-identical pairs of these commonitems were more similar in the morphologically rich language (mean r =

014) than in the impoverished language (mean r = 006 t102078 = 7565 plt 001) Not surprisingly this effect was carried largely by pairs with somedegree of orthographic overlap Across languages the representations ofpairs of items sharing a stem were more similar to each other (mean r =

052) than were pairs sharing second syllables (mean r = 033 t99358 =20023 p lt 001) which in turn were more similar than pairs with nocommon syllables (mean r = 006 t12646 = 6478 p lt 001) The lattereffect can be attributed to orthographic factors alone but the difference

Figure 7 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 30: Are non-semantic morphological effects incompatible with a ...

474 PLAUT AND GONNERMAN

between overlapping in stem versus second syllable reects the relativenumber of features in the canonical semantic representations of theseitems

Table 3 lists the mean correlation coefcients for word pairs that overlapin either stem or afx as a function of morphological transparency andlanguage (and t-tests for effects of language) Pairs of items sharing stemswere more similar in the rich compared with the impoverished language ateach level of morphological transparency These differences give rise tothe effects of language on morphological priming that are apparent inFigure 7 Note in particular that the representations of opaque itemssharing a stem are substantially more similar when learned within themorphologically rich language than when the same items are learnedwithin the impoverished language

Interestingly none of the comparisons for afx overlap were reliablealthough there may be a weak trend (given so few items) in favour of therich language for opaque items This suggests that the afxes themselvesare treated in a systematic fashion regardless of the overall level oftransparency of the language this is presumably because there are only 10of them and they are very common amounting to 25 of all secondsyllables in the impoverished language and 85 in the rich languageThese levels of overlap are of course much higher than for non-overlapping word pairs (mean r = 006) These ndings suggest that thesystem would exhibit afx priming (ie DARKNESS priming TOUGHN ESS) inboth morphologically rich and impoverished languages It is important tonote though that the afxes in the simulation even in the impoverishedlanguage are all highly transparent productive and have unreduced (eg

TABLE 3Mean Pearson correlation coefreg cients for word pairs as a functionof language and morphological transparency and t-tests for effects

of language

Language

Transparency Rich Impoverished t-test

Stem overlapTransparent 058 042 t898 = 238 p lt 001Intermediate 062 046 t898 = 237 p lt 001Distant 063 043 t898 = 286 p lt 001Opaque 060 039 t19 = 422 p = 001

Afx overlapTransparent 034 034 t898 = 058 p = 57Intermediate 034 033 t898 = 103 p = 30Distant 033 033 t898 = 047 p = 64Opaque 037 029 t26 = 127 p = 21

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 31: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 475

syllabic) phonological forms In fact empirical studies employing afxes ofthis sort have found reliable priming both in Hebrew (eg Deutsch Frostamp Forster 1998) and in English (eg Marslen-Wilson Ford Older ampZhou 1996)

SIMULATION 2

The rst simulation employed an abstraction of morphological structure inwhich all of the words formed from a given stem had the same degree ofsemantic transparency Although this organisation may be sufcient todemonstrate graded inuences of transparency it is in many ways a poorapproximation of the morphology of real languages in which there isgenerally a range of transparency among words generated from a commonstem For example the English stem SHORT is found in the transparentforms like SHORTN ESS and SHORTEST intermediate forms like SHORTAGESHORTLY and SHORTS and opaque forms like SHORTENIN G Thus it seemsimportant to conrm that the cross-linguistic differences in non-semanticmorphological priming observed in Simulation 1 were not due toidiosyncratic aspects of the form of morphological structure that wasemployed Accordingly we carried out a second simulation that was asclosely analogous to the rst simulation as possible except that semantictransparency varied in a graded manner within the set of words derivedfrom each stem

Method

Network architecture The network architecture was identical to thatused in Simulation 1 (see Figure 5)

Representations The orthographic representations and lsquolsquotransparentrsquorsquosemantic representations were the same as in Simulation 1 the onlydifference was in how these semantics were distorted to form wordmeanings As in Simulation 1 12 words were formed from each of 100stems with 40 lsquolsquoexperimentalrsquorsquo stems forming words that were commonacross the two languages Ten of the words derived from eachexperimental stem were created by combining with each of the 10afxes Of these 10 words two were transparent (no distortion totransparent semantics) two were intermediate (one-third of featuresregenerated randomly) two were distant (two-thirds of featuresregenerated) two were opaque (all features regenerated) and theremaining two were assigned to one of these transparency levelsrandomly The other two words derived from each experimental stemwere formed using randomly selected non-afx second syllables and wereassigned opaque semantics Thus as in Simulation 1 the 40 experimental

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 32: Are non-semantic morphological effects incompatible with a ...

476 PLAUT AND GONNERMAN

stems created (approximately) 100 words at each of four levels ofsemantic transparencymdashalthough now every stem contributed to everytransparency levelmdashas well as 80 additional opaque words containingnon-afx second syllables

The remaining 60 stems (720 words) were treated exactly as inSimulation 1 In the morphologically rich language each of these stemsformed 10 semantically transparent words using each of the 10 afxes andtwo opaque forms using other second syllables In the impoverishedlanguage they formed 12 semantically opaque words (two with randomlyselected afxes and 10 with other second syllables)

Training procedure The training procedure was the same as inSimulation 1 The network had to be trained slightly longer than in therst simulation in order to reach equivalent levels of cross-entropy errorFor the rich language the network required 360 sweeps through the 1200-word training corpus to achieve a mean cross-entropy error per semanticunit per pattern of 00033 for the impoverished language 650 sweeps wererequired to achieve a mean of 00029

Testing procedure The testing procedure was the same as inSimulation 1

Results and discussion

Table 4 presents the mean RTs of the network to target words of each levelof systematicity as a function of priming context and language The RTswere entered into a 2 3 4 ANOVA with stem as the random variableand language (rich impoverished) priming condition (identical morpho-logical control) and morphological transparency (transparent intermedi-ate distant opaque) as within-item factors The ANOVA revealed maineffects of language (F139 = 1002 p lt 001) of priming condition (F278 =

14727 p lt 001) and of transparency (F3117 = 6411 p lt 001) and two-way interactions of transparency with language (F278 = 1856 p lt 001)and with priming context (F6234 = 1867 p lt 001) In addition the three-way interaction of transparency context and language was also reliable(F6234 = 364 p lt 01) such that the decrease in priming as a function oftransparency was greater in the impoverished compared with richlanguage The directions of these effects were all in accordance with theresults of Simulation 1

Asecond 2 2 4 ANOVA that compared only the morphological andcontrol priming conditions produced the same pattern of effects Figure 8shows the magnitudes of morphological priming (ie control conditionminus morphological condition) at each level of transparency for the two

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 33: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 477

TABLE 4Mean RTs (and standard errors) to target words as a function of language and priming

condition and magnitudes of priming effects

Priming condition Priming

Transparency Identical Morphological Control Identity Morphological

Rich languageTransparent 343 (002) 406 (003) 471 (002) 128 065Intermediate 354 (003) 454 (003) 484 (002) 130 030Distant 354 (003) 466 (002) 486 (002) 130 020Opaque 364 (002) 476 (002) 496 (002) 131 018

Impoverished languageTransparent 351 (002) 419 (003) 472 (002) 122 054Intermediate 360 (003) 468 (003) 480 (003) 120 012Distant 364 (003) 482 (003) 485 (003) 120 002Opaque 374 (003) 493 (002) 494 (002) 120 001

Figure 8 Mean priming measured as mean RT following control primes minus mean RTfollowing morphological primes as a function of the transparency of the target word and thelanguage on which the network was trained

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 34: Are non-semantic morphological effects incompatible with a ...

478 PLAUT AND GONNERMAN

languages As in Simulation 1 a planned comparison restricted only toopaque targets found a reliable interaction of language and primingcondition (F139 = 1223 p lt 001) such that priming was greater in the richlanguage Separate t-tests showed that morphological priming was reliablein the rich language (paired t39 = 1256 p lt 0001) but not in theimpoverished language (paired t39 = 040 p = 069)

Thus the current simulation replicated the main ndings of Simulation1 (a) morphological priming effects varied as a function of graded degreesof semantic transparency of primes and targets and (b) semanticallyopaque forms yielded priming only in the morphologically rich languageIndeed the use of more natural morphological structure in which semantictransparency varied both within and between stems yielded an evenstronger cross-linguistic difference in priming for opaque (and evendistant) forms

GENERAL DISCUSSION

Traditional theories of lexical processing assume that morphology isreected explicitly in the structure of the lexicon words are built out ofdiscrete units called morphemes and a given word either is or isnrsquotdecomposed into constituent morphemes at each stage of processing Infact given a reliance on symbolic rule-based computation as the basis forlanguage knowledge it is difcult to imagine an alternative What would itmean to say that one symbol (a word) can be decomposed into componentsymbols (morphemes) partially

Distributed connectionist modelling offers an alternative Words arerepresented as alternative patterns of activity over multiple groups ofunits and subpatterns of those patterns can exhibit varying degrees ofstability and independence in a natural way Connectionist networks aresometimes described as simple lsquoassociationistrsquo (Pinker 1991) or lsquocorrela-tionalrsquo (Frost et al in press-a) systems but such descriptions misrepresentthe computational sophistication of the approach Networks can learninternal representations that reect the underlying functional structure ofthe domains in which they are trained and these representations cansupport effective handling of exceptional knowledge without compromis-ing generalisation to novel forms (Plaut et al 1996)

On a connectionist account morphological processing reects a learnedsensitivity to the systematic relationships among the surface forms ofwords and their meanings Given that these relationships exhibit gradeddegrees of systematicity the approach predicts that the magnitudes ofbehavioural measures of morphological processing such as primingshould vary continuously as a function of the degree of semantic and

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 35: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 479

formal transparency of stimuli These predictions were recently conrmedby Gonnerman et al (2000a)

The connectionist account would also seem to be supported by failuresto nd priming among morphologically related words in English that arenot semantically related (Feldman amp Soltano 1999 Marslen-Wilson et al1994) By the same token however evidence for non-semantic morpho-logical priming in Hebrew (Bentin amp Feldman 1990 Frost et al 1997 inpress-a in press-b) has been assumed to present a challenge for theaccount It turns out however that the predictions of the connectionistaccount in this context depend on the morphological structure of thelanguage as a whole and Hebrew is morphologically much richer thanEnglish

We reported on simulations in which a network was trained on abstractversions of either a morphologically rich or impoverished language andthen tested for morphological priming among a set of words in bothlanguages that varied in semantic transparency The network exhibitedgraded morphological priming as a function of semantic transparency inboth languages (consistent with Gonnerman et al 2000a) Moreimportantly there was reliable priming for morphologically related butsemantically opaque items in the rich language (consistent with Bentin ampFeldman 1990 Frost et al in press-a in press-b) but not in theimpoverished language (cosistent with Feldman amp Soltano 1999Marslen-Wilson et al 1994) Morphologically related but semanticallyopaque items in the rich language exhibit priming because the organisationof the internal representations in the network are dominated by thepervasive morphological structure of the language to such an extent thateven opaque items participate in it By contrast in the impoverishedlanguage the same items are free to behave idiosyncratically and thusexert little inuence on each other These ndings suggest that rather thanbeing challenged by the occurrence in Hebrew of morphological effects inthe absence of semantic similarity the connectionist approach may be ableto explain the cross-linguistic differences in the occurrence of these effects

To be clear we do not consider the current work in and of itself toconstitute an account of the relevant empirical ndings in English or inHebrew Indeed we employed tasks that bear only the most abstractrelationship to actual morphological systems To be more realistic thesystem would need to be modied both with regard to the tasks it wouldperform and the input it would receive For example children learn therelationships between meaning and sound before they learn anythingabout orthography While effects of literacy on interpretation ofmorphologically complex words have yet to be spelled out for English orHebrew knowledge of orthographic patterns will undoubtedly impactultimate representations since some morphological relationships in

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 36: Are non-semantic morphological effects incompatible with a ...

480 PLAUT AND GONNERMAN

English are maintained in orthography but obscured in speech (eg SIGNndashSIGN AL) An adequate account of the empirical results in our view wouldbe based on a simulation in which orthographic phonological andsemantic representations all interact in the process of interpreting writtenand spoken words (see Figure 1) Moreover the orthographic andphonological representations would need to approximate the actualsystems of the two languages and the semantic representations wouldneed to capture salient aspects of the relationships among word meaningsIn particular constructing an adequate approximation of Hebreworthography presents a number of challenges in that some aspects ofword patterns are represented with diacritics and others are typicallyomitted in material for adults Moreover the richness of Hebrewmorphology manifests not only in graded semantic transparency but also(and perhaps primarily) in the extensive number of inections andderivations which common phonological structure across words Bycontrast our abstract rich and impoverished languages differed only inthe token frequencies of afxes but not in their type frequenciesAdditionally while we examined degree of semantic transparency bothacross word forms (Simulation 1) and within them (Simulation 2) we didnot manipulate other distributional characteristics that have been shown tobe empirically relevant For example differential effects for morphologi-cally complex words based on cumulative root frequency compared tosurface-form frequency have been demonstrated for English (Taft 1979)and in other languages (Italian Burani amp Caramazza 1987 French Coleet al 1989) A more adequate model would thus also incorporate realistictype and token frequencies of both complex words and related simplewords We leave the development of such simulation to future work

It is also important to be clear that the connectionist approach does notentail the elimination of morphology as an important level of analysis oflinguistic structure and psycholinguistic behaviour To the contrary on ourview the morphological structure of a language has a fundamental impacton the organisation of lexical representation and processing The criticaldifference between our perspective and more traditional ones is thatrather than stipulate morphological principles as facts that are independentof other aspects of the lexicon we seek to derive them from independentlymotivated assumptions about the nature of orthographic phonological andsemantic representations in a language and the nature of learning andprocessing in the underlying (connectionist) mechanism Note thatmorphology is no different than other levels of representation in thisrespect For example phonological representations can be understood asarising from learned relationships among acoustic semantic and articu-latory information (Plaut amp Kello 1999) and semantic representations canbe understood as arising from learned relationships among multiple input

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 37: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 481

and output modalities (McGuire amp Plaut 1997 Plaut in press inpreparation Rogers amp Plaut in press) By adopting a multi-prongedapproach in which different simulations make linked approximations in theservice of addressing a range of specic issues in lexical processing we seekto develop a computationally explicit and empirically adequate theoreticalaccount of language behaviour that spans perception to action

REFERENCES

Andersen ES (1992) Complexity and language acquisition Inuences on the developmentof morphological systems in children In JA Hawkins amp M Gell-Mann (Eds) Theevolution of human language pp 241ndash272 Reading MA Addison-Wesley

Andrews S (1986) Morphological inuences on lexical access Lexical or nonlexical effectsJournal of Memory and Language 25 726ndash740

Aronoff M (1976) Word formation in generative grammar Cambridge MA MIT PressBentin S amp Feldman LB (1990) The contribution of morphological and semantic

relatedness to the repetition effect at long and short lags Evidence from HebrewQuarterly Journal of Experimental Psychology 42A 693ndash711

Bentin S amp Frost R (1995) Morphological factors in visual word identication in HebrewIn LB Feldman (Ed) Morphological aspects of language processing pp 271ndash292Hillsdale NJ Lawrence Erlbaum Associates Inc

Berman RA (1978) Modern Hebrew structure Tel Aviv Israel University PublishingProjects

Burani C amp Caramazza A (1987) Representation and processing of derived wordsLanguage and Cognitive Processes 2 217ndash227

Burani C Salmaso D amp Caramazza A (1984) Morphological structure and lexical accessVisible Language 18 419ndash427

Bybee JL (1985) Morphology A study of the relation between meaning and formPhiladelphia PA John Benjamins

Chauvin Y amp Rumelhart DE (Eds) (1995) Back-propagation Theory architectures andapplications Hillsdale NJ Lawrence Erlbaum Associates Inc

Chomsky N (1957) Syntactic structures The Hague MoutonChomsky N (1965) Aspects of the theory of syntax Cambridge MA MIT PressChomsky N amp Halle M (1968) The sound pattern of English New York Harper amp

RowCohen JD Dunbar K amp McClelland JL (1990) On the control of automatic processes A

parallel distributed processing account of the Stroop effect Psychological Review 97(3)332ndash361

Cole P Beauvillain C amp Segui J (1989) On the representation and processing of prexedand sufxed words A differential frequency effect Journal of Memory and Language 281ndash13

Cottrell GW amp Plunkett K (1995) Acquiring the mapping from meanings to soundsConnection Science 6 379ndash412

Crain S (1991) Language acquisition in the absence of experience Behavioral and BrainSciences 14 597ndash650

Deutsch A Frost R amp Forster KI (1998) Verbs and nouns are organized and accesseddifferently in the mental lexicon Evidence from Hebrew Journal of ExperimentalPsychology Learning Memory and Cognition 24(5) 1238ndash1255

Feldman LB (Ed) (1995) Morphological aspects of language processing Hillsdale NJLawrence Erlbaum Associates Inc

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 38: Are non-semantic morphological effects incompatible with a ...

482 PLAUT AND GONNERMAN

Feldman LB amp Soltano EG (1999) Morphological priming The role of prime durationsemantic transparency and afx position Brain and Language 68 33ndash39

Fodor JA amp Pylyshyn ZW (1988) Connectionism and cognitive architecture A criticalanalysis Cognition 28 3ndash71

Fowler C Napps S amp Feldman LB (1985) Relations among regular and irregularmorphologically related words in the lexicon as revealed by repetition priming Memoryand Cognition 13 241ndash255

Frost R Deutsch A amp Forster KI (in press) Decomposing morphologically complexwords in a nonlinear morphology Journal of Experimental Psychology LearningMemory and Cognition

Frost R Deutsch A Gilboa O Tannenbaum M amp Marslen-Wilson W (in press)Morphological priming Dissociation of phonological semantic and morphologicalfactors Memory and Cognition

Frost R Forster KI amp Deutsch A (1997) What can we learn from the morphology ofHebrew A maskedndashpriming investigation of morphological representation Journal ofExperimental Psychology Learning Memory and Cognition 23(4) 829ndash856

Gonnerman LM (1999) Morphology and the lexicon Exploring the semanticsndashphonologyinterface Unpublished doctoral dissertation Department of Linguistics University ofSouthern California Los Angeles CA

Gonnerman LM Andersen ES amp Seidenberg MS (2000a) Graded semantic andphonological effects in priming Evidence for a distributed connectionist approach tomorphology Manuscript submitted for publication

Gonnerman LM Devlin JT Andersen ES amp Seidenberg MS (2000b) Derivationalmorphology as an emergent interlevel representation Manuscript submitted for publica-tion

Grainger J Cole P amp Segui J (1991) Masked morphological priming in visual wordrecognition Journal of Memory and Language 30 370ndash384

Harm MW amp Seidenberg MS (1999) Phonology reading acquisition and dyslexiaInsights from connectionist models Psychological Review 106(3) 491ndash528

Hertz J Krogh A amp Palmer RG (1991) Introduction to the theory of neural computationReading MA Addison-Wesley

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp 77ndash109 Cambridge MA MIT Press

Hinton GE amp Sejnowski TJ (1986) Learning and relearning in Boltzmann Machines InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Volume 1 Foundations pp282ndash317 Cambridge MA MIT Press

Hoeffner JH amp McClelland JL (1993) Can a perceptual processing decit explain theimpairment of inectional morphology in developmental dysphasia A computationalinvestigation In EV Clark (Ed) Proceedings of the 25th Annual Child LanguageResearch Forum pp 38ndash49 Stanford CA Center for the Study of Language andInformation

Joanisse MF amp Seidenberg MS (1999) Impairments in verb morphology after braininjury A connectionist model Proceedings of the National Academy of Science USA 967592ndash7597

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 39: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 483

Kello CT amp Plaut DC (1998) Modulation of lexical effects in word reading withoutstrategic control of pathways Evidence from tempo naming [Abstract 73] Abstracts of thePsychonomic Society 3 7 (Proceedings of the 39th Annual Meeting Dallas TX)

Kello CT amp Plaut DC (in preparation) Strategic control by gain manipulation in acomputational model of word reading in the tempo naming task

Kello CT amp Plaut DC (in press) Strategic control in word reading Evidence fromspeeded responding in the tempo naming task Journal of Experimental PsychologyLearning Memory and Cognition

Kempley ST amp Morton J (1982) The effects of priming with regularly andirregularly related words in auditory word recognition British Journal of Psychology73 441ndash454

Laudanna A Badecker W amp Caramazza A (1989) Priming homographic stems Journalof Memory and Language 28 531ndash546

Laudanna A Badecker W amp Caramazza A (1992) Processing inectional andderivational morphology Journal of Memory and Language 31(3) 333ndash348

Laudanna A Cermele A amp Caramazza A (1997) Morphondashlexical representations innaming Language and Cognitive Processes 12(1) 49ndash66

Marchand H (Ed) (1969) The categories and types of presentndashday English wordndashformationMunich CH Becksche Verlagsbuchhandlung

Marslen-Wilson W Tyler LK Waksler R amp Older L (1994) Morphology and meaningin the English mental lexicon Psychological Review 101(1) 3ndash33

Marslen-Wilson WD Ford M Older L amp Zhou X (1996) The cominatorial lexiconPriming derivational afxes In G W Cottrell (Ed) Proceedings of the 18th AnnualConference of the Cognitive Science Society pp 223ndash227) Hillsdale NJ LawrenceErlbaum Associates Inc

McClelland JL (1979) On the time relations of mental processes An examination ofsystems of processes in cascade Psychological Review 86 287ndash330

McClelland JL Rumelhart DE amp the PDP Research Group (Eds) (1986) Paralleldistributed processing Explorations in the microstructure of cognition Volume 2Psychological and biological models Cambridge MA MIT Press

McClelland JL St John M amp Taraban R (1989) Sentence comprehension A paralleldistributed processing approach Language and Cognitive Processes 4 287ndash335

McGuire S amp Plaut DC (1997) Systematicity and specialization in semantics Acomputational account of optic aphasia In Proceedings of the 19th Annual Conferenceof the Cognitive Science Society pp 502ndash507 Hillsdale NJ Lawrence Erlbaum AssociatesInc

McLeod P Plunkett K amp Rolls ET (1998) Introduction to connectionist modelling ofcognitive processes Oxford UK Oxford University Press

Minsky M amp Papert S (1969) Perceptrons An introduction to computational geometryCambridge MA MIT Press

Murrell GA amp Morton J (1974) Word recognition and morphemic structure Journal ofExperimental Psychology 102 963ndash968

Napps SE (1989) Morphemic relationships in the lexicon Are they distinct from semanticand formal relationships Memory and Cognition 17 729ndash739

Napps SE amp Fowler CA (1987) Formal relationships among words and the organizationof the mental lexicon Journal of Psycholinguistic Research 16 257ndash272

Page M (in press) Connectionist modelling in psychology A localist manifesto Behavioraland Brain Sciences

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S (1994) The language instinct New York Morrow

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 40: Are non-semantic morphological effects incompatible with a ...

484 PLAUT AND GONNERMAN

Plaut DC (1991) Connectionist neuropsychology The breakdown and recovery of behaviorin lesioned attractor networks Unpublished doctoral dissertation School of ComputerScience Carnegie Mellon University (Available as Technical Report CMU-CS-91-185)

Plaut DC (1995) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (in preparation) Systematicity topography and graded modality-specicspecialization in semantics An account of optic aphasia

Plaut DC (in press) Systematicity and specialization in semantics In GW HumphreysD Heinke amp A Olson (Eds) Proceedings of the Fifth Neural Computation andPsychology Workshop Connectionist models in cognitive neuroscience New YorkSpringer-Verlag

Plaut DC amp Booth JR (in press) Individual and developmental differences in semanticpriming Empirical and computational support for a single-mechanism account of lexicalprocessing Psychological Review

Plaut DC amp Kello CT (1999) The interplay of speech comprehension and production inphonological development A forward modeling approach In B MacWhinney (Ed) Theemergence of language (pp 381ndash415) Mahwah NJ Lawrence Erlbaum Associates Inc

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th AnnualConference of the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plaut DC amp Shallice T (1993) Deep dyslexia A case study of connectionistneuropsychology Cognitive Neuropsychology 10(5) 377ndash500

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Quinlan P (1991) Connectionism and psychology A psychological perspective on newconnectionist research Chicago University of Chicago Press

Rogers TT amp Plaut DC (in press) Connectionist perspectives on categoryndashspecicdecits In E Forde amp GW Humphreys (Eds) Category-specicity in brain and mindNew York Psychology Press

Rueckl JG amp Dror IE (1994) The effect of orthographicndashsemantic systematicity on theacquisition of new words In C Umilta amp M Moscovitch (Eds) Attention andPerformance XV Conscious and nonconscious aspects of cognitive processing pp 571ndash588 Cambridge MA MIT Press

Rueckl JG Mikolinski M Raveh M Miner CS amp Mars F (1997) Morphologicalpriming fragment completion and connectionist networks Journal of Memory andLanguage 36(3) 382ndash405

Rueckl JG amp Raveh M (1999) The inuence of morphological regularities on thedynamics of a connectionist network Brain and Language 68 110ndash117

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognitionVolume 1 Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE McClelland J L amp the PDP Research Group (Eds) (1986b) Paralleldistributed processing Explorations in the microstructure of cognition Volume 1Foundations Cambridge MA MIT Press

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161

Page 41: Are non-semantic morphological effects incompatible with a ...

A CONNECTIONIST APPROACH TO MORPHOLOGY 485

Schreuder R amp Baayen H (1995) Modeling morphological processing In LB Feldman(Ed) Morphological aspects of language processing pp 131ndash154 Hillsdale NJ LawrenceErlbaum Associates Inc

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Smolensky P Mozer MC amp Rumelhart DE (Eds) (1996) Mathematical perspectives onneural networks Hillsdale NJ Lawrence Erlbaum Associates Inc

Stanners RR Neiser JJ Hernon WP amp Hall R (1979) Memory representation formorphologically related words Journal of Verbal Learning and Verbal Behavior 18 399ndash412

Stolz JA amp Besner D (1998) Levels of representation in visual word recognition Adissociation between morphological and semantic processing Journal of ExperimentalPsychology Human Perception and Performance 24(6) 1642ndash1655

Taft M (1979) Lexical access via an orthographic code The Basic Orthographic SyllableStructure (BOSS) Journal of Verbal Learning and Verbal Behaviour 18 21ndash39

Taft M amp Forster K (1975) Lexical storage and retrieval of prexed words Journal ofVerbal Learning and Verbal Behaviour 14 638ndash647

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Werbos PJ (1974) Beyond regression New tools for prediction and analysis in the behavioralsciences Unpublished doctoral dissertation Harvard University

Zorzi M Houghton G amp Butterworth B (1998) Two routes or one in reading aloud Aconnectionist lsquolsquodual-processrsquorsquo model Journal of Experimental Psychology HumanPerception and Performance 24 1131ndash1161