A Connectionist Model of Competition at the Phonological ...

A Connectionist Model of Competition at the Phonological-lexical

Interface during Speech Perception

An honors project for the Departments of Computer Science and Psychology

By Eric M. Forbell

Bowdoin College, 2000

© 2000 Eric Forbell

ii

Table of Contents

Abstract 1

1. Introduction 2

1.1 An overview of the TRACE and multiTRACE models 4

1.1.1 The original TRACE model 7

1.1.2 MultiTRACE 11

1.2 Speech perception 13

1.2.1 Lexical priming 14

2. Speech Simulation 15

2.0.1 Phonemic layer 17

2.0.2 Transitional layer 19

2.0.3 Lexical layer 21

2.1 Lateral inhibition and lexical competition 22

2.1.1 Lexicon organization 25

2.2 Sampling lexical networks 26

2.2.1 “Subject” networks 28

3. Simulating the lexical priming experiments 30

3.1 Experiment 1 30

3.1.1 Design 30

3.1.2 Results and discussion 32

3.2 Experiment 2 35

3.2.1 Design 35

3.2.2 Results and discussion 35

4. Other modeling issues 39

4.1 Speech speed 39

4.2 Sequence bias 41

4.3 Error rate and lateral inhibition 43

4.4 Word representation 45

4.5 Typicality effect 46

5. Conclusion 47

Acknowledgements 52

References 54

1

Abstract

A connectionist architecture comprised of Hebbian cell assemblies was developed

and applied to the problem of speech recognition at the phonemic-lexical interface.

Speech was encoded in the model as the sequential activation of phoneme representations

connected to higher-level linguistic structures. An architectural decision concerning the

spatial organization of the top-level lexical map was supported by psycholinguistic

evidence suggesting a topographical layout in which cognitive distance was related to

initial phonological structure. Through computer simulation, lateral inhibition at this

lexical level was shown to be a necessary and sufficient mechanism to replicate the

findings of a series of lexical priming experiments. The results of these experiments

provided added support for a more general theory of cognition based upon Hebb’s

original cell assembly hypothesis.

2

1. Introduction

Studying the brain and its emergent high-level cognitive functions is currently

being approached from many different perspectives. Until recently, cognition was

interesting only to psychologists and philosophers, but that group has since been

expanded to include various other, seemingly unrelated fields like computer science and

mathematics. Many characteristics of the current approaches to cognition are due to the

contributions from emerging researchers in these fields.

One such approach, called connectionism, was first introduced in the late 1950’s

by a psychologist named Frank Rosenblatt because of several computational-related

discoveries in theoretical computer science. He specified the design of a computational

model called a Perceptron using simple computing elements connected in parallel fashion

that attempted to mimic the organization of the brain that had been found in early visual

processing areas (Rosenblatt, 1958). His specification involved clustering units into

layers that impinge upon higher and higher layers, though he was only able to

successfully train networks of two levels. The types of pattern classifications that could

be solved using this architecture only include those that are linearly-separable, a fact later

proven by two computer scientists, Minsky and Papert (1969). The logic function

exclusive-or (XOR) corresponds to a classification that cannot be represented by this

architecture, for example, because resulting category members are linearly inseparable.

Minksy and Papert’s (1969) book, Perceptrons, was perhaps solely responsible for the

lack of connectionist research for the next twenty years because it stressed the incredibly

difficult learning problems associated with multi-layer networks.

3

While the connectionist approach was largely ignored, an alternative approach to

cognition, and intelligence in general, formed, and it was grounded in a commitment to

producing practical results. Artificial Intelligence research in the 60’s and 70’s

abandoned simulating the lower-level structure of the brain and focused on modeling

high-level cognitive concepts known as symbols. During this era humans were compared

to computers in that they are both types of information processors: they receive stimuli

from the environment, perform some kind of internal processing and then produce a

result. Computers were easily programmed to manipulate symbols and so it was assumed

that humans must compute in the same way.

It was not long before the limitations of symbol processing —the manipulation of

symbols according to the principles of second-order logic—started to weaken the

approach of the classical artificial intelligence researcher. Certain problems, such as

those involving pattern recognition and classification, tasks that humans excel at, were

simply not possible using this symbol framework because they were not well-grounded in

a representation of the environmental stimulus or they became intractable due to

complexity issues. Nevertheless, these emerging limitations and the rekindling of

research with connectionist networks changed the course of research in the early 1980’s.

The possibilities of connectionism were reinvestigated at this time, but it was

thought that an extension to the information-processing model using a massively parallel

architecture could be useful (Feldman & Ballard, 1982; McClelland, Rumelhart, &

Hinton, 1986). The number of time steps a typical artificial intelligence program used to

solve a problem at that time was much too large to be biologically feasible, so a shift to a

connectionist framework seemed inevitable. The main issues with connectionist models

4

at the time were network stability, dealing with noise, forming representations of

sequences, and representing high-level cognitive concepts (Feldman & Ballard, 1982).

Despite the progress to date, these issues still remain to be resolved.

The connectionist modeling approach used here stems from a tradition stressing

the importance of cyclical or recurrent networks as the basic unit of cognition. This

recurrent circuit is primarily inspired from Hebb’s construct of cell assemblies (1949).

The basic unit of this connectionist model, called TRACE (Tracing Recurrent Activity in

Cognitive Elements) is actually part of a larger, more complete cognitive architecture

(Kaplan, Chown, & Sonntag, 1991). A strength of a modeling approach committed to

providing holistic explanations of cognitive function is that of confidence: any

component added to the architecture cannot, without reasonable evidence, require a

mechanistic change in an already existing component.

The history of cell assembly theory will now be discussed and will be followed by

descriptions of the two models from which the current model evolved. In the section

following, a phenomenon of speech perception will be introduced that provides rich

constraints for neural models of cognition. The body of data that was modeled—whereby

differences in response times varied with word-form relationships in a priming

paradigm—offered support for specific neural architectures and mechanisms that extend

the applicability of TRACE.

1.1 An overview of the TRACE and multiTRACE models

Hebb, with limited neurophysiological knowledge at his disposal, claimed that

some structure in the nervous system must exist so that stimuli from the environment can

persist long after the stimuli are no longer present. During development, individual

5

neurons, later to comprise a cell assembly, are grouped together based upon similarities

in the temporal nature of their firings. That is, the connection between one cell and

another is strengthened when both cells are firing simultaneously:

"Whenever an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased." (Hebb, 1949, p. 62)

At the basic level, this simultaneous firing pattern indicates a similarity in responding to

some particular part of the stimulus, and so the two cells combined form a more complete

and robust description of that input.

Mechanisms by which connections are strengthened between two cells were not

known during Hebb’s time, but his intuition for the requirement of some sort of synaptic

change was enlightening. Although his proposed “learning rule” is in some way or

another part of every learning algorithm since, Hebb fell short in explaining the

significance that inhibitory factors may have on the network’s operation. The rampant

spread of activity that would result without inhibition or some sort of neural fatigue was

witnessed by early connectionist researchers of the 1950’s that tried to simulate Hebb’s

theories using a digital computer (Rochester et al., 1956). These researchers, through

simulation, were able to claim that Hebb’s theory alone is not sufficient for the

formulation of cell assemblies and that additional mechanisms such as inhibition were

necessary (Milner, 1957).

The mechanism for synaptic strengthening seems to start with a post-synaptic

event known as long-term potentiation (LTP), as evidenced in the hippocampus of the rat

(Muller, Joly & Lynch, 1988; Tocco et al., 1992). In addition to synaptic strengthening, a

6

reverse process has also been found to occur, long-term depression (LTD), whereby

connection strength decreases when the post-synaptic cell is active while the pre-synaptic

cell is inactive, and not the other way around (White et al. 1990; Abraham and Goddard,

1983). Hetherington and Shapiro (1993) have strengthened this post-not-pre LTD

mechanism through computer simulation of cell assemblies that are unique, persistent and

reliable in their activation as a result of some stimulus.

This recent research on mechanisms of synaptic modification and its agreement

with Hebb’s original hypothesis to a certain extent, has prompted further investigation of

the cell assembly construct. Milner’s inhibitory component was one such advance to

Hebb’s theory and more recent theories now include a component of neural fatigue and

short term connection strength (STCS) (Kaplan, Sonntag & Chown, 1991). This recent

model, called TRACE, attempts to dynamically simulate the functioning of a population

of neurons comprising a cell assembly through a system of difference equations. It does

not, however, model the formation of an assembly through the strengthening and

weakening of internal connections, and assumes the cell assembly to be well-trained.

This is a short-coming in the model, but the reduced complexity makes possible an

analysis of how assemblies may interact with one another when TRACE units are

networked. Models that learn assemblies at the lowest level have failed to achieve this

level of analysis.

Because the underlying structure of an assembly is not taken into account by this

model, this may seem no different than the classic artificial intelligence symbol systems

discussed earlier. However, the TRACE model was developed with biological constraints

in mind because it is intended to model human cognitive function, not merely intelligent

7

behavior. That is, the behavior of the unit both temporally and mechanistically is based

on psychological and neurophysiological data. The issue of constraining a cognitive

model is of prime importance these days, and so it must not be overlooked (ch. 3 in

Newell, 1990). Neuroscience provides bottom-up constraints, such as the fact that the

speed of neural firing is known and the timing of neural circuits can be interpolated. On

the other hand, cognitive psychology provides experimental evidence indicating the

response time in which basic deliberate cognitive acts (i.e., recognition, choice selection)

occur. These are top-level constraint on any system intending to model, and therefore

explain, basic cognitive behavior.

An introduction to the models themselves will now proceed from the simple

TRACE unit to the definition of a MultiTRACE network. The main components of each

model will be discussed to the point where further extensions to the model should be

clear, but a more formal treatment of the model specifications can be found elsewhere

(Kaplan, Sonntag & Chown, 1991; Sonntag, 1991; Chown, 1994).

1.1.1 The original TRACE model

TRACE is a mathematical specification of the functional dynamics of a single cell

assembly. The most important and complicated component of the model is perseveration

(P), or activity, as it is most often referred to in a connectionist system. Activity

represents the combined activity of all the neural elements that comprise the cell

assembly. Unlike individual neurons, the activity of a cell assembly is determined by the

complicated feedback connections between units and so has the ability to reverberate for

quite a long time. This reverberation period has two distinct phases: perception and

8

primary memory (Fig. 1.1). Activity is dependent upon two main factors which correlate

to the terms seen in the delta equation of Table 1.1. The value of activity can be thought

Time step

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8

10 30 50 70 90

200

400

P

F

S

perception

primary memory

Figure 1.1: A plot of activity (perseveration), short-term connection strength and fatigue over a period of about 4 seconds (t = 400) using the original TRACE equations and parameters (Kaplan, Sonntag & Chown, 1991).

of as the percentage of internal units firing at that point in time. Therefore, the value 1 – P

represents the percentage of units still capable of firing to increase the level of activation

of the assembly as a whole. The first term in the delta equation for P is the rate of growth

of activity and is determined by how sensitive (V) the assembly is to activation due to

fatigue, long-term and short-term connection strength and how many of the internal

members have not yet fired (1 – P):

VPIPPPrise ∗−∗−+=∆ )1())1((

9

Alternatively, the term expressing the decline of activity is sensitive to neurons

forming an assembly dropping out due to fatigue (θl) as well as a competitive inhibition

(θc) from other cell assemblies (Milner, 1957):

)1)()1(( VPPPP cldecline −−+=∆ θθ

This inhibitory factor was approximated in the original TRACE model because it

is a model of a single cell assembly. MultiTRACE models can explicitly define this

degree of inhibition, and this will become important for the implementation of lateral

inhibition.

The sensitivity (V) of a cell assembly to fire is determined, as noted earlier, by the

strength of internal connections (LTCS), short-term connection strength, and fatigue:

vFSLV )( +=

Short-term connection strength and fatigue both have update equations that are similar to

activity, that is, they have both growth and decay terms:

SSPS

FFPF

dg

dg

σσ

φφ

−−=∆

−−=∆2

2

)1(

)1(

The behavior of these two components can be seen in comparison to activity in Fig. 1.1.

Fatigue will become much more important to the model when multiple TRACE units are

embedded in a network. For now, it suffices to know that fatigue is the main factor in

stopping the rise of activation. Short-term connection strength is a factor that increases

the ability of neuron to activate another neuron when it initially begins to fire and it is

10

substantiated by a set of experiments performed in the 1960’s and by recent

neurophysiological data (Kleinsmith and Kaplan, 1963; Magleby, 1987).

Table 1.1: The system of equations that describe the functional dynamics of a basic TRACE unit. A time step, t, corresponds to approximately 10 ms (Kaplan, Sonntag, & Chown, 1991).

><<=

∆+=+∆+=+∆+=+∆+=+

δδα

tttI

LtLtLStStSFtFtFPtPtP

when0.00 when)(

)()()1()()()1(

)()1()()1(

Equations Update

0.0

)()()(

2

2

=∆

−=∆

−=∆

+=

+−+=∆

LSSPS

FFPFv

FSLV

VPPPVPIPPP

dg

dg

cl

σσ

φφ

θθ

EquationsDelta

Variables P(t) : perseveration (activity) F(t) : fatigue S(t) : short term connection strength L(t) : long term connection strength I(t) : external input

)-(1quantity thedenotes XX

Parameters θl : unit loss θc : inhibitory competition v : normalization factor φg : fatigue growth φd : fatigue decline σg : STCS growth σd : STCS decline α : input amplitude δ : input duration

Because a TRACE unit represents one well-learned cell assembly, it does not

model the alterations of synaptic weights between internal neurons. Therefore, internal

LTCS does not change at all during the simulation:

0.0=∆L

Lastly, a cell assembly is activated by an general input, I. This input is a stimulus

that the cell assembly has theoretically represented through the feedback connections in

11

between the member neurons. The input, therefore, has a magnitude which is the value

of I and a duration that tells the simulator how long to present the stimulus (Table 1.1).

1.1.2 MultiTRACE

The simplification requirements to successfully model a single cell assembly no

longer apply when TRACE units are set in a network environment and connected with

one another. The interactions of units can result from more “connectionist-like”

mechanisms, instead of being approximated by a mathematical model. Model

components that are more improved because of a network environment include input,

inter-unit connectivity (LTCS), STCS, and fatigue. Each of these improvements will be

discussed to an introductory level so that the further improvements made in this study can

be understood in the framework of MultiTRACE. It will also be noted that all

MultiTRACE equations are derivations of Chown’s implementation which was derived

from the original specification by Sonntag (Chown, 1994; Sonntag, 1991).

First, the input function of a TRACE unit embedded in a network begins to

approach that of a typical connectionist simulation. That is, input is divided into both

excitatory and inhibitory types. Because units are interconnected with other units in

MultiTRACE, both the inhibitory and excitatory input for a given unit is determined by

the classic connectionist summation method (note activity/perseveration is now denoted

as A, to conform to connectionist standards):

12

2.0) set to (currentlyfactor inhibition lateral L0.5) set to (currentlyfactor inhibition global G

layer ain units ofnumber Ninputs inhibitory and excitatory ofnumber mk,

)()(

)() ()(

)()(1

)(

1

===

=

=

+=

+

=

=

−=

∑∑

∑

tAwtinput

tAswtinput

tAGtinputL

I

tinputIIII

kjkinhm

kjkjkexck

N

im

inhm

inh

k

exck

exc

inhexcnet

σ

The implication of these changes to input is that now units can be affected and activated

by other units. Psychologically, connections between two units refers to an association

between two active symbols. The behavior of these units allow for a more complex

interaction between “symbols” than a mere association, however, and this lies at the

essence of the multiTRACE model in explaining psychological phenomena (Kaplan,

Weaver & French, 1990).

Connectivity between units, denoted by wjk in the traditional manner represents

the strength with which unit j excites unit k. The unidirectional nature of the connection

stems from the fact that typical neurons are one-way devices. Units also have a short-

term effect (STCS) on other units in a network. Short-term connection strength (sjk)

between units in multiTRACE is considered to be a separate mechanism from STCS in a

TRACE unit. STCS in TRACE was explained at the level of a neuron as being a

temporary facilitation to fire one neuron at the onset of rapid firing of the pre-synaptic

cell (Magleby, 1987). It will not be useful to delve into the differences of STCS at the

inter-unit level, because this factor is more important for inter-unit learning which is not

13

modeled here. Further information about STCS in a multiTRACE environment can be

found elsewhere (Chown, 1994).

Inhibitory input is computed differently than excitatory input in this version of

multiTRACE because it is a two-dimensional component whereas excitatory input is one-

dimensional. Total inhibition imposed onto a cell assembly in this model can be

decomposed into regional inhibition and local (or lateral) inhibition. Regional inhibition

is a mechanism whereby the spread of activity is controlled and so acts as a negative

feedback system. Regional inhibition is approximated in this model by summing the total

network (or layer) activity and returning this value adjusted by a constant.

Separately, local inhibition on a cell assembly comes from other assemblies in the

region. Inhibitory connections separate from excitatory connections are maintained in the

current implementation of the model and real number weights are used to indicate the

strength of inhibition from the source to a target. Like that of regional inhibition, the

summed total of local inhibition is adjusted by a multiplicative factor.

1.2 Speech perception

A cognitive model gains support when it can be implemented and applied to some

psychological domain. Additionally, a model can be improved simultaneously because

an application will provide additional constraints that in many cases may clarify

ambiguities or suggest possible new mechanisms. This project was concerned with

modeling a particular aspect of speech perception known as lexical contact. Lexical

contact is defined as the phase whereby the representations activated by the speech input

make initial contact with the lexicon (Frauenfelder & Tyler, 1987). This process, and

14

speech in general presents a difficult challenge to cognitive models because the task is

highly temporal in nature. Dealing with sequential stimuli has been a weakness of classic

connectionist models.

1.2.1 Lexical priming

Specifically, the cognitive model described in this introductory section will be

used to simulate and hopefully replicate the results of a particular set of experiments from

the field of psycholinguistics. The data from Slowiaczek and Hamburger suggest that

phonologically similar words compete at the lexical (word) level of speech recognition

(1996). Specifically, this competition was observed in a priming paradigm in which the

primes were phonologically related to target words by the number of initial phonemes.

These data provide both architectural and mechanistic constraints on phonological and

lexical processing, and so are crucial to validating this connectionist model. The early

part of the next section will present a thorough overview of this connectionist architecture

and the method by which lexical contact will be simulated using multiTRACE.

15

2. Speech Simulation

The primary goal for this project was to simulate, as accurately as possible, the

phase of speech perception called lexical contact. The neural model at the heart of the

simulation is motivated by behavioral evidence that constrains its design. One such

constraint is that of hierarchy. Using spoken-word priming experiments, Slowiaczek and

Hamburger (1992) observed two dissociated effects, termed phonological facilitation and

lexical interference, that suggested a connectionist architecture whose lexical and

phonological representations are separate. Facilitation, or a decrease in response time,

was observed during low word-similarity conditions. Alternatively, response times

increased when word-similarity increased to 3-phonemes of overlap. These two results

provided suggestions for different relationships in the hierarchy. In hierarchical systems,

vertical refers to between-level relationships while horizontal corresponds to

relationships between members of the same level. The first effect, phonological

facilitation, supports a design whereby phonological representations excite separate

lexical representations in a vertical manner. On the other hand, the interference effect

suggests that words with similar initial phonological structure inhibit one another

horizontally. With these constraints in mind, a connectionist simulation was developed

that was capable of mimicking the experiments performed by these researchers

(Slowiaczek & Hamburger, 1992; Hamburger & Slowiaczek, 1996).

Several simplifications in the simulation design were necessary in order to focus

on lexical contact. The input stream, which in actuality consists of compressions of air

impinging on the peripheral auditory system, were modeled—as is typically done in the

field—as an incoming stream of phonemes. The assumption for a linear input of discrete

16

and invariable units is a large one, but was necessary in order to abstract to the level of

analysis used here.

The output of the simulation, given some serial input of phonemes, is a

recognized word. Therefore, this connectionist model is simply a mapping, albeit

complex, between a series of small, temporally-distinct sound units and a unit word

representation. Furthermore, this mapping is encoded in the activity of phoneme units

excited by external input that in turn cause the excitation and subsequent activation of

lexical (word) units:

Speech �� phoneme unit activity �� lexical unit activity �� Word

Given this scheme, there is no reason to require that the units representing

phonemic information directly cause the activation of lexical units. That is, phoneme

unit activity in this model is essentially only a neural implementation of the external

physical stimulus and does not provide additional information to the recognition process.

Additional layers can be inserted into this model without restraint and still the basic

transformative process is retained, whereby phoneme unit activity maps to lexical unit

activity, but perhaps indirectly:

Speech �� phoneme activity �� intermediate activity �� lexical activity �� Word

In actuality, a lot of the human cortex is organized similar to this mapping point

of view. The mammalian cerebral cortex is organized into six layers, and it is typically

the case that the same layers act as input layers (IV) and others as output layers (V – VI)

throughout the brain. Additionally, unlike the inner layers, the outermost layers (I-III)

usually do not receive or send out any long-distance neural processes but instead act more

locally as areas of intermediate processing, receiving information from input layers and

17

sending projections to output layers (Calvin, 1995). Although this is highly simplified,

this rough approximation of cerebral organization suggests a hierarchical theme for

cerebral processing, one that is directly applicable to the perceptual process under

examination.

The layered connectionist architecture that was used to model lexical contact here

was motivated by the psycholinguistic evidence (Slowiaczek & Hamburger, 1992).

However, an additional layer was inserted into the hierarchy because a 2-tiered network

failed to achieve successful recognition capabilities. However, it was noted above that

this does not necessarily conflict with the supporting evidence, because phonological

representations still map, albeit indirectly, to lexical representations. Therefore, the final

architectural design consisted of three distinct layers—phonemic, transitional and

lexical—which will be discussed in the next three sections.

2.0.1 Phonemic layer

The phonemic representation of the speech input is merely an arbitrary

association of short speech sounds with cognitive representations. Therefore, the

phonemic layer in the network consists of cell assembly units that represent, in an

abstract sense, the phonemes of a language, which in this case closely mirror that of

English. Table 2.1 shows the phonemes that were accounted for in the current model but

it should be noted that completeness and accuracy of this list is not important, because

ideally linguistic processing should function regardless of actual language specifics.

Phoneme categories were used in the definition of the regular language (a series of rules

used to build the word maps that is described later) used to generate “legal” words in the

lexical layer. It should also be noted that every cell assembly modeled in this particular

18

system is identical with respect to its parameters, and the crucial ones are listed in Table

2.2. These values were determined experimentally to achieve cell assembly units with a

time-course of activity appropriate for speech perception (high-refresh and fast-acting).

Table 2.1: List of phonemes represented in the modeled language network. Phoneme category Represented phonemes

Vowels /ue/, /oo/, /oh/, /uh/, /eh/, /ah/, /ay/, /iy/, /ee/, /oy/

Stop consonants /b/, /p/, /d/, /t/, /g/, /k/

Fricatives /v/, /f/, /th/, /sh/

Nasals /m/, /n/, /nk/, /nd/, /ng/

Glides /l/, /r/

Table 2.2: Layer parameters Parameter Value

Fatigue growth 0.15

Fatigue decline 0.04

STCS growth 1.0

STCS decline 0.2 * STCS: Short-term connection strength

Input to the phonemic layer is completely external, simulating the fact that a more

primary auditory system not being modeled is providing the innervation (Fig. 2.1). There

is good neurophysiological evidence that this is indeed the case, as the primary auditory

cortex has been found to process complex temporal events in the sound stimulus —

including linguistic sounds— in both monkeys (Steinschneider, Schroeder, Arezzo, &

Vaughan, 1995) and in humans (Liégeois-Chauvel, Laguitton, & Chauvel, 1999). Both

neural imaging experiments (Zatorre, Meyer, Gjedde, & Evans, 1996; Binder, Frost,

Hammeke, Cox, Rao, & Prieto, 1997) and event-related potential (ERP) studies (Celsis,

Doyon, Boulanouar, Pastor, Démonet, & Nespoulous, 1999) also suggest that

19

phonological representations formed in the left secondary auditory cortex (left

temporoparietal regions) are activated as a result of this initial temporal processing.

Therefore, when speech is being received in the current model, the corresponding

phoneme representation is presented with a square input wave (amplitude = 1.0, duration

= 100 ms) that causes an activation period of about 310 ms. There is no horizontal

connectivity in this layer, and all outgoing connections map onto the transitional layer

which is described next.

Figure 2.1: Input to the phoneme layer is completely externalized here in that it is not directly modeled. Instead, it is assumed that these phonological representations lie in the secondary auditory cortex (or an association area) and receive processed auditory information from the primary auditory cortex.

2.0.2 Transitional layer

When the need for an intermediate layer between the word and phoneme units

arose for reasons discussed later, a decision was required as to what would be represented

in this layer. Because speech is a sequential stimulus and order is relevant, a speech

parser should be very sensitive to this characteristic. This suggests that it may be useful

for an intermediate level of processing to help resolve this temporal order.

Phonetic invariance, the property that a given phonetic sound has acoustical

properties that remain consistent across all its instances in spoken language, is not a

characteristic of human languages (Pisoni & Luce, 1987). For example, the acoustic

signature of a consonant preceded by one vowel may be different from its signature when

preceded by a different vowel. Additionally, the spectral characteristics of this consonant

when it comes before a vowel may look even more different. Therefore, it is quite likely

Peripheral auditory system Primary auditory cortex Secondary auditory cortex (phonemic layer)

20

that humans use this “transitional” information between consonants and vowels to help

determine relative order. This effect was demonstrated effectively by Jenkins, Strange

and Erdman (1983) by showing that the recognition of a vowel in a perturbed stimulus

was most successful when the stimulus retained the consonant-vowel transitional

information and omitted the center of the vowel and the consonants.

Because the acoustic signature of a syllable and its reverse form are different, it is

highly likely that they have separate representations in the cognitive system. For

example, there would be a distinction made between /eb/ and /beh/, if in fact there are

different neural representations for both. This cognitive distinction, when added to the

model provided a marked improvement in word recognition during early testing. The

improved performance does not come without a cost, however. In the artificial neural

network used here, for instance, the representation of all possible legal biphones (legality

of a word is defined in the next section) adds 380 units to a system containing 27

phoneme units.

The only difference between the two complementary forms of a transitional or

biphone unit (i.e., /bi/ and /ib/) is the input that it receives from the phoneme layer.

Because there is not sufficient evidence to bias either form over its complement, each

connection was made equal in strength. Likewise, lateral inhibition was not applied

either, because there is no evidence to support it at this level. Future research on this

topic might involve implementing this transitional layer with inhibition, however. When

inhibition is not present and the size of the input increases, competition at the transitional

level increases, resulting in even greater competition (due to more word representations

being activated) at the lexical level that may not be resolved correctly. This will be

21

evidenced by experiments discussed later in which error rates in performance have room

for improvement.

2.0.3 Lexical layer

The final layer in the system is, of course, the word or lexical layer. This layer

receives all of its input from the transitional units below and is built by an algorithm that

generates random words from an arbitrarily predefined language that resembles a subset

of the English language (but which also contains words that do not exist in English). All

words are monosyllabic and contain a maximum of five phonemes. This restriction was

made because it is hypothesized that multi-syllabic words add additional complexity to

the recognition process in that they most likely require additional levels of processing.

The strength of a transitional�lexical connection, in this model, is a function of

the serial position of the transition (or biphone) in the word. This aids the simulated

cognitive system in biasing the order of transitions, thereby differentiating between words

that share common transitional units, and is a product of the learning process. Primacy is

a strong property of sequence learning, so using it as a rule for assigning weights seems

plausible. Recency effects of learning sequences are not applicable here, however,

because recency effects are traditionally associated with short-term memory processes

that decay over time, and so do not relate to the long-term connection strength properties

being discussed here (Murdock, 1962).

A rule for building connection weights between the transitional and lexical units

was implemented based upon this theory. The weight of a connection from a transition to

a lexical assembly decays linearly by a constant factor of 1/10 with increasing position in

22

the word. For the longest words in the modeled lexicon (5 phonemes/4 transitions), the

last transition in the word sequence has ~40% the strength of the primary position.

2.1 Lateral inhibition and lexical competition

With the general framework for the connectionist model described, the crucial

component and main hypothesis of this research will now be examined. The usefulness

of hierarchy in a cognitive symbol system is that as one proceeds upwards in the

hierarchy, elements of lower levels can be substituted by singular elements. This

substitution, which is similar to the process of “chunking” in learning serial lists, can

facilitate cognitive processing. For example, modality-specific information can be

transformed, through substitution, from information specific to that modality to a general

and common framework of “concepts”. This relates to the general theory of associative

cortex as an area where multi-modal integration occurs.

The flexibility gained through cognitive hierarchies also introduces problems of

control. Specifically, when discussing networks of cell assemblies or populations of

neurons, the control issue lies in restraining the spread of activity. The pattern of

connectivity in neural systems from lower levels to higher levels of a hierarchy is not

necessarily restricted to a many-to-one mapping, but it is instead a many-to-many

relationship. In this more complicated type of mapping, a true substitutive effect is not

met with regards to connectivity (Fig 2.2). An additional mechanism is therefore

necessary to produce “winner-take-all” behavior (for a good description of winner-take-

all connectionist networks, see Feldman & Ballard, 1982).

23

Figure 2.2: In (1) above, elements {a,b,c} and {d} are fully substituted by elements E and F, respectively. It can be seen that all of the elements in the left space are “represented” in the right space and these representations do not overlap. However, in (2), which is a many-to-many mapping, the elements {a,b,c,d} are not represented without overlap as both b and c map to both E and F.

Peripheral visual areas of most animals include a neural mechanism that is useful

to achieve this substitutive goal. Lateral inhibition is an organizational and chemical

mechanism, first observed in the simple eye of the Liminus (i.e., Brodie, Knight &

Ratliff, 1978), capable of increasing the contrast level of an incoming visual stimulus.

Some retinal ganglion cells have receptive fields (essentially, the group of retinal

receptors in a lower level that project to this cell) such that the ring around the center

provides inhibitory input and the center area, excitatory input (Fig. 2.3). Therefore, the

cell is most excited when acute differences in light occur in the stimulus, and not diffuse

patterns. By accentuating these differences in the light stimulus, the cell provides a kind

of filter of the important edge data relevant to interpreting a visual scene.

24

Figure 2.3: (1) Center-on, off-surround retinal ganglion cell. The cell is most excited when only the center of its receptive field is receiving a light stimulus. The area surrounding this center provides inhibitory input when lit. (2) Shaded portions of the cell refers to no light in the correlated part of the cell’s receptive field. (3) Shows what the cell’s activity (firing frequency) would look like given the regions being shaded directly above the datapoint.

The importance of inhibition in the neural system is demonstrated by the

incredible success of animal visual systems. Although the lateral inhibition described

above refers to a level-to-level inhibitive mechanism, the same general technique can also

be applied to a same-level paradigm, where adjacent regions of some cognitive space

inhibit its surrounding neighboring regions. The effect of this kind of organization is a

focusing of activity, exactly what is needed to approach a winner-take-all type of

behavior.

25

2.1.1 Lexicon organization

Psycholinguistics has produced solid evidence that suggests the lexicon to be

organized based upon word structure. One such line of research in the domain of speech

perception uses a task called single-word shadowing to help determine the structural

relationships of words in lexical memory. The stimuli for a single-word shadowing task

consists of two words, a prime followed by a target which are both spoken aloud. The

participant’s task is to simply repeat the target word aloud as quickly and as accurately as

possible. Slowiaczek (1994) validated this technique as a means of measuring the access

to lexical memory as opposed to simply accessing an acoustic buffer, by showing that the

recognition of a target was facilitated by a semantically related prime. Utilizing this

technique, two other studies were able to shed light on the organization of the lexicon by

manipulating phonological relationships between targets and primes, and observing how

these relationships affected response time (Hamburger & Slowiaczek, 1996; Slowiaczek

& Hamburger, 1992). The basic experimental design consisted of target words paired up

with primes that varied in the amount of initial phonological overlap (i.e., 2-phoneme

overlap: blast and block). An effect that was consistent across both studies was that the

response time in the 3-phoneme overlap condition (high similarity) was significantly

slower than that of the low similarity conditions (1- and 2-phoneme overlap), but not

necessarily slower than the 0-phoneme overlap condition.

Slowiaczek and Hamburger (1992) proposed a connectionist model consisting of

a prelexical and a lexical level whereby inhibitory connections between phonologically

similar words were strong. This intuitive architectural design would explain the

interference phenomenon observed in the high similarity condition. In the high similarity

26

condition, the effect of the inhibition is maladaptive because recognition is slower.

However, lateral inhibition at the lexical level should be effective in general by resolving

the competition between words being activated by the speech input. In this paradigm,

lateral inhibition acts as an automatic gain control, where gain in this environment can be

defined as the ratio between activity and incoming input. By suppressing its neighbors

which are providing inhibitory input, a unit can automatically increase its gain if it has

even the slightest competitive advantage in excitatory input.

In the network model discussed here, lateral inhibition is implemented as separate

inhibitory connections between lexical cell assemblies. The level of inhibition is directly

proportional to the degree of initial phoneme overlap (i.e. /blast/ and /blak/ inhibit one

another more than /blast/ and /blue/). If the lexical map is thought of as a space,

phonological overlap in a geometric sense becomes a cognitive distance metric.

Kinsbourne (1982) refers to the phenomenon that similar concepts tend to interfere with

one another more than dissimilar ones as the “cerebral distance principle”.

2.2 Sampling lexical networks

The implementation described above makes a number of assumptions about the

learning of words and the product of learning—long term connection strength between

cell assemblies. This is a shortcoming of a model that is not grounded to a physical

stimulus (either simulated or real) which learns weights directly from a developmental

perspective. That is, the networks used in this simulation are mature and were not

trained. Because this is a mathematical model of a cognitive system, the weights will

determine its behavior, and so it is important to make sure that the performance of the

system is consistent across some experimental range of assigned connection values. This

27

experimental range was determined more or less through trial and error as there was little

information available to use as a gauge of how large this range should be. In spite of this,

a method of assigning weights from a normalized distribution was imposed so that the

notion of a simulated “subject” could be defined and standard statistical analyses could

be conducted.

Each level in the network was assigned mean connection strength values

associated with incoming connections from the previous layer (Table 2.3). These values

were determined experimentally to attain a system that performed with a reasonable error

rate (< 5% error) for a network composed of 169 words. The connection strengths were

sampled from a normal distribution with a standard deviation of ± 5% from these mean

values. The task used in this initial exploration of parameters was to achieve a correct

mapping from phonemic input to lexical activity. The lexical unit with the

Table 2.3: Mean long-term connection strength values and standard deviations used in the final experimental network architecture.

Connection Strength Standard deviation

Phonemic�transitional 0.5 ± 5%

Transitional�lexical 0.3 ± 5%

highest overall activity level during a period of 750 ms after the onset of the stimulus was

deemed to be “recognized”. It should be noted that with a network of approximately

160-250 words, the behavior of the system never approached true winner-take-all in the

general case, although many trials did show a clearly dominant winner. Sometimes, a

“winner” for a trial did not dominate (operationally defined in section 3) very much at all,

although its activity was mathematically the highest.

28

2.2.1 “Subject” Networks

In order to run experiments, collect data, and conduct statistical analyses, the

notion of a subject was defined. First, let it be noted that every complete network

contained a set of words sharing the same initial phoneme. Modeling the dynamics of

lexical competition was the goal here and the behavioral evidence dictated that

phonological similarity might serve as a distance metric. Therefore, these networks

represent patches of cortex and the scale of the system that could be supported

computationally was constrained to words beginning with the same phoneme.

Subjects in real behavioral experiments are most often randomly sampled from an

assumed normal population. To mimic the variability of a normal population, the

connection weights were randomly sampled from a normal distribution of real numbers,

since in this particular model, the connection strengths are the variables that are

influenced by different life experiences, educational background, et cetera, that would be

found in real human subjects. Because we are assuming normality, it is likely that

subjects drawn from a population of native language speakers will have very similar

representations for monosyllabic words as well as the basic articulated sounds of the

language, so this simulation is well-founded.

Additionally the set of words represented by any particular network is free to vary

or remain the same across a group of subjects. By varying the word sets between subject

groups and obtaining significant main effects, the confidence of the statistic can be

elevated because it shows that the effect is robust across several datasets. Therefore, it

was the preferred method to use a random word set for one group of subjects, each

29

having individual network structures with respect to connection weights, and produce a

new randomized word set for another set of subjects.

A goal for this section was to provide the architectural framework for the

neurally-plausible model being proposed. The key architectural decisions —hierarchy

and the distribution of both excitatory and inhibitory connections—were made while

observing the constraints imposed by the behavioral evidence. It will be determined in

the next section whether the network, as defined here, will be capable of effectively

modeling the actual language processing phenomena that have been discussed.

30

3. Simulating the lexical priming experiments

A motivation for simulating the experiments carried out by Slowiaczek and

Hamburger (1992; 1996) is to provide support for the multiTRACE model which has

good theoretical grounding, but weaker empirical support. Simulation, though artificial,

can be a powerful tool whereby hypothetical mechanisms can be implemented to some

degree of realism and then tested using objective experiments. Additionally, it is the

hope that a simulation effort, if successful in replicating real phenomena, might suggest

additional hypotheses that can be tested further. That is, a simulation becomes extremely

useful if it is shown to have predictive value for subsequent behavioral experimentation.

The first experiment that will be described in this section corresponds to the general

priming experiment designed by Slowiaczek and Hamburger (1992; 1996).

3.1 Experiment 1

Producing a network that performed well was an iterative process, because there

are many factors that affect its behavior. The goal of this initial section is to describe the

first successful experiment that was run, in the sense that parameterization issues had

been solved and the network was performing well. A later section will be reserved for

discussing the additional issues that arose on the way.

3.1.1 Design

Thirty-seven lexical networks were built by the methods described previously

containing representations for 169 words. The words used were monosyllabic and were

standardized to 4 phonemes in length due in order to simplify the calculation of response

time. Three different word sets were used between the subjects, but data was not

31

analyzed with respect to this variable because the representations in the network are

unrelated to actual word labels or sounds, and these are merely arbitrary assignments

given to keep track of their individual behavior. Also, in all experiments, words that

contained redundant phonemes were not used as either a prime or target item because the

results in these cases were problematic. That is, it was not clear how to assign connection

weights between sub-word and word units when more than one connection was required.

Each generated “subject” was tested on 15 target words, which were paired with a

set of three primes that varied in initial phonological similarity with the target. Rather

than using a completely unrelated prime word as the control condition, instead the

baseline was a trial that did not include a prime. This was required because the networks

contained only representations of words with the initial phoneme in common. Again,

these networks model patches of cortex, not the entire lexicon.

As in the behavioral procedure, the prime word during each trial was “spoken”

aloud 500 ms (modeled time) before the target word, measuring time from the end of the

prime stimulus. Each word tested was broken down into its constituent phonemes and

presented to the network at a standard rate of 1 phoneme every 100 ms. Response time

was recorded on successful trials; since the ability to speak the target aloud was not

within the simulation’s capabilities, an alternative measure was used. An error-free trial

was defined earlier as one in which the correct representative cell assembly attained the

highest activity level within a 750 ms window after the onset of the target item. The

response time, therefore, was measured from the onset of the target item to the time at

which the target cell assembly was the most active representation for three consecutive

time steps. This window of domination corresponds to 30 ms. Several observation

32

sessions were performed by the experimenter to verify that this was a good objective

measure.

The parameters for this first experiment are summarized in Table 3.1, along with

the inhibition constant that was used. The inhibition constant is a multiplicative factor

used in addition to phonological overlap to determine the strength of inhibition between

two lexical assemblies.

Table 3.1: Parameters for Experiment 1. Inhibition factor is a constant multiplied by the inverse of cognitive distance to determine inhibition strength between two lexical cell assemblies.

Parameter Value

Phoneme interval 100 ms (10 simulated steps)

Target and phoneme length (in phonemes) 4

Number of subjects 37

Number of repetitions per subject 15

Number of words represented 169

Inhibition factor 0.7

3.1.2 Results and discussion

A typical trial run demonstrating lexical competition during the presentation of

both the prime and the target is presented in Figure 3.1. In this particular trial, “b-r-oh-

sh” was the prime word and “b-r-oh-f” was the target. The activity of the lexical units

clearly shows that the trial was successful in that recognition was correct.

The summary information for Experiment 1 is displayed in Table 3.2. The error

rate, which is expected to be high because the heuristic used for determining success is

not perfect, was fairly consistent across all groups, and so it was not tested further.

However, it should be noted that only the response times of successful trials were

33

included in the analysis. Additionally, for each subject tested there were four condition

means calculated from the 15 replications of each condition.

Figure 3.1: The activity levels of active lexical units during a trial run are presented. The onset of the prime (“b-r-oh-sh”) and target (“b-r-oh-f”) are indicated by asterisks (*) below the x-axis. The incompleteness of the curves at t < 350 is the result of a recording unit that only begins recording a lexical unit’s activity once it surpasses a certain threshold, for computational reasons.

Table 3.2: Mean reaction time and error rates as a function of priming condition No prime 1-phoneme overlap 2-phoneme overlap 3-phoneme overlap

RT Error rate RT Error rate RT Error rate RT Error rate

39.87 .11 37.98 .09 37.74 .09 35.24 .10

A one-way analysis of variance (ANOVA) was conducted in order to reveal any

statistically significant main effects of priming condition. The main effect of priming

condition was indeed significant [F(3,144) = 17.49, p < 0.0001]. It was also determined

34

that the 1-phoneme condition was different than the no prime condition and the 2-

phoneme condition was different from that of the 3-phoneme overlap condition [t(36) =

4.77, p < 0.00001], [t(36) = 3.79, p < 0.0006]. That is, phonological overlap produced

a facilitatory effect and this effect increased as overlap increased.

The facilitation observed in this experiment for all phonological overlap

conditions is not consistent with Slowiaczek and Hamburger (1992; 1996). However, in

the earlier study, facilitation was observed in the low similarity case and at the time was

attributed to a prelexical effect. That is, because the target words in these overlap

conditions are sharing input from phoneme units with the primes, they too are being

partially stimulated as the prime is being activated. Therefore, when the target word is

actually spoken, the subject is able to respond more quickly. The later study attributed

this facilitation to strategic processing whereby the subject, upon noticing the

relationships between many of the primes and targets in the trials, begins to make

expectations that this will again be the case in future trials. In this later study, expectancy

was controlled by manipulating the number of trials containing phonological overlap

across subject groups, a variable termed the phonological relatedness proportion (PRP).

It was only in groups containing a majority of trials with phonological overlap (high PRP,

75%) that displayed facilitation effects (Hamburger & Slowiaczek, 1996).

Because the results presented here do not agree with that of the behavioral

evidence, it suggested that this architecture, as currently implemented, was not correct.

Specifically, because facilitation increased with phonological overlap even as similarity

became high (3-phoneme overlap) the results suggested that the level of inhibition was

perhaps too low. A follow-up experiment was performed, adding the level of inhibition

35

as an independent variable so that its effect on the performance of the network could be

analyzed.

3.2 Experiment 2

3.2.1 Design

The experiment as described in Section 3.1.1 was repeated for four different

inhibition levels: 0.7, 0.9, 1.7, 2.0. These values are used to scale all of the inhibitory

weights assigned between lexical units in the simulation. One-hundred and forty-eight

subjects were used (37 for each inhibition group) in total. The previous data from

Experiment 1 was used as data for the inhibition level of 0.7 in this experiment. Network

parameters were also exactly the same as in Experiment 1 (see Table 3.1).

3.2.2 Results and discussion

The error rates of the networks as a function of inhibition and overlap are

presented in Figure 3.2. There appears to be an increase in error rate as the level of

inhibition is increased, however this trend is not consistent across all levels of inhibition.

However, because the error rates are relatively consistent across the overlap condition, it

is not likely to be a detriment to the analysis of response time across this variable and so

it was not analyzed further. However, it should be noted that the problem may involve

diminishing levels of activity as the speech “signal” is passed upwards through the levels

(see section 4.3). Additional research will have to be done in order to discover a way to

control this effect.

As was done in the first experiment, all trials containing an error in any of the

conditions were discarded from the condition mean calculation for each subject. Since

36

there was adequate replication at each condition level, it is plausible to continue with the

analysis of response latencies as a function of inhibition level. Figure 3.3 presents a

summary of these data.

0.00

0.05

0.10

0.15

0.20

0.25

0.7 0.9 1.7 2

Lateral inhibition level

Erro

r rat

e

no primeoverlap 1overlap 2overlap 3

Figure 3.2: Mean error rates across prime conditions, grouped by lateral inhibition level.

A two-way analysis of variance was used to investigate the possible main effects

of priming condition and lateral inhibition, and more importantly, determine if there is an

interaction. Significant main effects of both priming condition [F(3, 576) = 23.972, p <

0.001] and lateral inhibition [F(3, 576) = 23.839, p < 0.001] were indeed found.

Additionally, there was a significant interaction between the two factors [F(9, 576) =

8.000, p< 0.001].

37

The significant interaction was decomposed further by running four one-way

ANOVAs across the four priming conditions for each level of inhibition. There was a

significant main effect of priming condition for Inhibition 1 [F(3,144) = 17.49, p <

0.001] and Inhibition 2 [F(3, 144) = 19.78, p < 0.001]. Alternatively, there was no main

effect of priming condition for Inhibition 3 [F(3,144) = 2.18, p = 0.09], but there was a

main effect for Inhibition 4 [F(3,144) = 4.70, p < 0.004]. Follow-up t-tests determined

32.00

33.00

34.00

35.00

36.00

37.00

38.00

39.00

40.00

41.00

0.7 0.9 1.7 2

Lateral inhibition level

RT

(tim

e un

its)

no primeoverlap 1overlap 2overlap 3

Figure 3.3: Mean response times for all priming conditions grouped by levels of lateral inhibition. The fourth inhibition condition (Inh. = 2) replicates the findings of Slowiaczek & Hamburger (1992).

that only in the Inhibition 4 case was the 3-phoneme overlap condition slower than the 1-

phoneme overlap condition when using a two-tailed t-test [t(36) = 4.52, p < 0.0001].

The 1- and 3-phoneme overlap mean difference in the Inhibition 3 group did not reach

significance [t(36) = 1.76, p = 0.09]. Additionally, the 3-phoneme overlap condition was

38

not significantly slower than the 2-phoneme overlap [t(36) = 1.47, p = 0.15], a result that

agrees with the data of Hamburger and Slowiaczek (1996). Finally, in all levels of

inhibition, response time was faster with 1-phoneme overlap when compared to there

being no prime; Inhibition 1 : [t(36) = 4.77, p < 0.0001], Inhibition 2 : [t(36) = 4.98, p <

0.0001], Inhibition 3: [t(36) = 3.46, p < 0.002], Inhibition 4: [t(36) = 4.73, p < 0.0001].

The significant interaction between inhibition level and priming condition was the

result that was needed to provide support to the theory that lateral inhibition at the lexical

level is the mechanism responsible for high similarity interference (Hamburger &

Slowiaczek, 1996). At lower levels of inhibition, lexical competition never resulted in

interference, and there was always an increasing facilitatory effect as the number of

overlapping phonemes increased. Only when inhibition was raised to a sufficiently high

level (somewhere between 1.7 and 2.0) did the interference phenomenon have a

significant impact.

As this study deals with a simulated model of a particular aspect of speech

perception and is therefore markedly simplified, it would not be useful to labor upon

describing all of the inconsistencies between the actual behavioral evidence and the

results reported here. The larger picture that can be drawn from this simulation is that

inhibition is likely to be responsible for the high similarity interference. Additionally,

across all inhibition levels, low similarity facilitation occurs, so it can be concluded that

the two effects are dissociable—most likely the result of two different processes. In this

model, low similarity facilitation occurs because of the similarity in phonemic inputs

between related primes in the absence of strong lateral inhibition at the lexical level.

39

4. Other modeling issues

Inhibition at the lexical level was shown to be the primary factor affecting

competition in the simulated experiments described in the last section. As the level of

inhibition increased, the response time of the target unit increased in the high similarity

condition. Because this simulation is perfectly controlled, inhibition can be implicated as

the causal factor underlying this high similarity interference. However, because this is a

theoretical simulation, this result cannot automatically be generalized to actual human

beings, unfortunately, but it does provide good support for inhibition as a mechanism.

In this section, other issues that arose during experimentation will be discussed

and their effects on network performance and lexical competition will be examined. If

there is a reasonable explanation for why something was problematic, it will be

discussed, otherwise it will be left for a topic of future research.

4.1 Speech speed

The first major factor that was observed to have a dramatic effect on network

accuracy was the phoneme presentation interval. To reiterate, this was the length of time

that elapsed between the sequential stimulation of the phonemic units that comprised a

speech stimulus. In the final experiments, this interval was clamped at 100 ms. An early

set of experiments was run, manipulating this presentation interval to see if there was any

observable trend in error rate (Fig 4.1).

At first observation, this performance trend would suggest that the network was

only capable of handling a certain range of speech speeds adequately. Speech speeds that

fell out of this range see a marked drop in recognition accuracy. It may be postulated that

40

this is, in fact, a weakness of the current network model. Perhaps it is not robust enough

for this particular application, because humans seem to be very adept at understanding

many speeds of speech.

An alternative view, however, is to allow for the possibility that the modeled time

step in the simulation is not perfectly scaled to its real-world counterpart or that the

different cognitive levels simulated have different time scales. The simulated time

interval was the result of a modeling effort by Kaplan et. al (1991) to support Miller and

Marlin’s empirical findings suggesting that 5 seconds was the span of time after the

R2 = 0.9201

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 50 100 150 200 250

Phoneme interval (ms)

Pure

err

or ra

te

Figure 4.1: Pure error rate (target unit was overall winner) vs. phoneme interval. A quadratic trend was applied to the data points and its fitness was moderately good (R2 = 0.9201). The minimal error rate occurred when the interval was set to 100 ms, so this setting was used during the priming experiments.

presentation of an item during which information may be consolidating (1984). The

original designers of TRACE assumed that the model would have to be improved as the

network model was tested further. This suggestion to reinvestigate the scaling of time

41

in the model is highly speculatory, but it may be worth the effort. If, however, time is

actually slower in the model than in reality, then the range of phoneme intervals where

performance was adequate would be increased, according to the trend depicted in Figure

4.1.

4.2 Sequence bias

In section 2.0.2, it was suggested that a two-tiered model is incapable of

representing the highly sequential nature of a word stimulus. There must be some way to

encode the notion of order, because otherwise a word would merely be a set of phonemes

in which sequence did not matter. Because there are many instances in human languages

in which the meaning of a word is altered when constituent units are scrambled, it is

obvious that some mechanism or organization exists to account for the inherent sequence

of a word. The transitional layer was introduced in section 2 as a means to overcome the

initial problem of encoding sequence. However, an alternative mechanism could encode

this bias at the transitional-lexical interface. In such a scenario, a method of assigning

weights would take the position of the transition in the word into account and in essence

penalize transitions occurring late in the word. Tests showed that this was effective in

accomplishing the goal of successful word recognition and it was supported by the

evidence for recency effects in learning serial lists, but an alternative hypothesis may also

be correct.

This alternative may be that an even more complicated hierarchy exists (Fig. 4.2).

That is, it is possible that two or even three layers could exist between the phonemic level

and the lexical level. Additionally, the lexical level modeled here is also likely to be the

root of a higher hierarchy that includes support for polysyllabic words. The relatively

42

high overall error rate for this model that hovered close to 10% perhaps provides support

for this hypothesis. If the network is not making correct predictions it is because other

words that received the same degree of input are competing for dominance. It was not

always the case, either, that the winner of the competition in these unsuccessful

Figure 4.2: Other hierarchical possibilities for the lexicon. The levels drawn with solid lines correspond to levels represented in the current model. One possible layer in between the transitional and monosyllabic word layers that makes linguistic sense is a syllable layer.

trials was highly similar, phonologically. For example, say there are three words

receiving comparable input across some lexical map region. Two of the three words are

near in this region and so are inhibiting one another. The third competitor, because it is

receiving excitatory input that is on par with these two competing neighbors, is likely to

win the competition.

These types of situations occurred in the simulation here because the connection

weights were being chosen from a normal distribution of real numbers. Although its

probability is low, it is possible to assign a high weight value to a particular connection

that would typically and correctly only receive a low value. Because the real language

43

system is particularly effective in very noisy environments, the vulnerability witnessed

here in the simulated environment most likely does not exist in reality. Instead, it is

likely the case that more intermediate representations exist to further encode sequential

bias into the final representation of the word at the lexical level. The method of assigning

weights between the transitional and lexical layer was adopted at first because it was

effective and because it was computationally feasible. It was noted earlier that the

number of transitional units in the second layer was the number of legal combinations of

phonemes in the language. A second intermediary layer on top of the transition layer

would require N more units, where N is the number of legal combinations of transitional

elements. This combinatorial explosion made the implementation of a secondary

transitional layer unfeasible using the current research simulator.

4.3 Error rate and lateral inhibition

The data from Experiment 2 showed a trend where raising the level of lateral

inhibition increased the number of errors in word recognition. This increase in errors was

not specific to any of the priming conditions, however. Because lateral inhibition acts in

reducing competition and thereby focusing activity, it should have an effect to increase

performance accuracy. This was not the case, however, and so the issue must be

addressed.

Initial experimentation with networks built using a higher level of inhibition

suggested that the error-prone performance was due to a gain problem. That is, as more

inhibition was present at the lexical level due to lateral inhibition (not just regional

inhibition) the level of activity relative to incoming input was globally reduced in this

layer. An additional analysis of variance was conducted on a random sampling of 2080

44

trials from each inhibition level to determine if this effect was actually significant (means

are plotted in Figure 4.3). As expected, there was a significant main effect of inhibition

on the level of lexical assembly activity (the activities of the target assemblies were used

from both successful and unsuccessful trials) [F(3, 8316) = 401.63, p < 0.0001].

Follow-up t-tests indicated that all group means were significantly different, 0.7/0.9:

[t(2080) = 6.07, p < 0.05], 0.9/1.7: [t(2080) = 18.84, p < 0.05], 1.7/2.0: [t(2080) = 3.92, p

< 0.05]. This suggests that the stronger amount of inhibition is causing the

00.050.1

0.150.2

0.250.3

0.350.4

0.45

0.7 0.9 1.7 2

Inhibition level

Max

imum

targ

et a

ctiv

ity

Figure 4.3: The maximum lexical target activities are plotted across levels of inhibition. All means plotted were significantly different (see text).

speech signal, once it reaches the lexical level, to be weaker, and not sufficient to cause

excitation leading to a dominant lexical winner. Future research will have to address this

issue and reverse its effect.

45

4.4 Word representation

When building the test word sets for the experiments, a simplification was made

to remove all words that had duplicate phonemes or transitions (i.e., /blalb/), although

words with these characteristic were still represented in the network. This was done to

avoid any confounding variables that might have been created by incorrectly representing

the connections that define these words. At first this problem was addressed by imposing

a rule whereby secondary and tertiary connections from a transitional unit to the same

lexical unit were weaker than the primary connection. This initial solution, before it was

removed, was rationalized by learning theory.

Hebbian learning with cell assemblies can be characterized as a resource problem,

because underlying the associations between assemblies are physiological factors which

include the number of synapses between the neuronal members of a cell assembly.

Neurotransmitters need to be produced for each synapse, and so the active status of a

synaptic cleft is determined in part by how fast each cell can produce and transport these

chemical messengers to the terminus of each axon. Additionally, the process by which

learning occurs, which has been postulated to be the result of a long-term postsynaptic

membrane change is also rate-limited due to protein synthesis (in order to build

membrane channels, etc.). These resource limitations are likely to place a ceiling on the

strength of the connection between two cell assemblies. Therefore, if a transitional unit is

already associated with a word unit, its second association is likely to be limited in

strength because they are sharing the same underlying neuronal connections or because

the number of synapses a given cell can support is limited. Because a complete language

46

system will need to address this issue, additional empirical evidence is sought to provide

a sound basis towards any particular implementation.

4.5 Typicality effect

Typicality in a TRACE unit can be modeled by increasing internal long-term

connection strength which represents the strength of synaptic efficacy between the

neuronal members of the assembly. Frequently used linguistic components (phonemes,

words or composites of either of these) are likely to have tightly coupled internal

structures refined through experience that allows the assembly to reverberate with less

excitatory input. Given the same amount of excitatory input, an assembly with a highly

developed internal structure as opposed to one that is weaker internally will achieve a

higher activation level, thereby having a greater excitatory or inhibitory effect on

assemblies with which it is connected. Although this seems like a reasonable

implementation, typicality was not modeled here in order to reduce the chance for

confounding factors to be introduced.

47

5. Conclusion

A key feature of this simulation is that a particular problem in cognition (in this

case word recognition) can be approached without regards to content (Kaplan et al,

1990). The cell assemblies in this simulation are arbitrarily given labels so that they can

be interpreted as phonemes or words in a discussion, but the underlying computational

output of the system pays attention only to spatial relationships. Lateral inhibition

between cognitive assemblies was computed based upon the distance between the

corresponding assemblies. This distance was inferred from a phonological structural

relationship, of course, but this in turn was discovered through behavioral testing. The

structure-sensitive processing on display here is particularly interesting because it extends

naturally to a general theory of cortical function.

If one compares, for the moment, the human cortex and a digital computer’s

memory, it will immediately be apparent that they cannot possibly operate in the same

fashion. There is general support that the information of a cognitive “memory” is stored

in the weight matrix of synaptic efficacies between neurons. That is, the biochemistry at

each neural synapse has been modulated over time and through perceptual experience so

that the collective populations of neurons form an internal representation, or cognitive

map, of the outside world. In a digital computer, information stored in memory is

typically transferred to a register—a transient storage location that is local to the CPU

and therefore very fast to access—before computation is performed on it. This is simply

not a possibility in a brain, because this “register” would have to be composed of neurons

as well, and the time it would take to emulate the information contained in the synaptic

matrix of a particular area of memory in a temporary structure for separate processing

48

would be metabolically impossible, given the time constraints (Amit, 1995). Instead, the

memory must be accessed “in-place”, and it is because of this that a flexible set of

content-independent yet spatially-based operations becomes useful.

The cognitive process under study here was speech perception. Of particular

interest is the organization of the internal representations of words, phonemes and other

linguistic structures in the area of memory dedicated to language. Behavioral evidence

suggests that a relationship existed—based upon word structure—that causes a pattern of

interference in response times indicative of some kind of competition (Hamburger &

Slowiaczek, 1996; Slowiaczek & Hamburger, 1992). Using a computer simulation, a

series of experiments in this study showed that local inhibition between word units

restricted to the lexical level could account for this interference.

How do these results fit into a grand theory of cognition? In order to begin to

answer this question, it is necessary to consider Hebb’s original cell assembly hypothesis

again and how this might relate to the development of the language system (Hebb, 1949).

Hebb originally recognized the need for some kind of structure that would initially

respond to a stimulus then reverberate for some time after the stimulus is gone. Although

at the time he was not a neurophysiologist but instead a psychologist, he speculated that it

would be adaptive for two neurons that respond synchronously to a given external

stimulus to affect one another metabolically in such a way so that on the next

presentation of the same stimulus, there would be a stable and more unitary response.

More than 40 years later, neurophysiologists are beginning to find evidence for these

localized patterns of reverberation in macaques after being trained to respond to different

stochastically-generated visual images (Miyashita, 1988; Miyashita & Chang 1988;

49

Sakai. & Miyashita, 1991). The key results in these experiments relevant to the

discussion here were that the reverberations were consistently identifiable, lasted for as

long as 16 seconds, and were localized in cortical space (about 1 mm2 of anterior ventral

temporal [AVT] cortex) (Sakai. & Miyashita, 1991). All of this evidence lends great

support to Hebb’s construct of a cell assembly (Amit, 1995), or at least to some primary

component that might comprise of a complete cell assembly that may be more distributed

in nature. However, for the types of representations discussed in this project, especially

phonological representations, it is most likely the case that their representations are

concise and localizable—in not requiring additional sensory information that may be

obtained from transcortical regions.

Furthermore, Hebb’s understanding of learning in the brain can be applied here to

link the behavior of lexical systems discussed in earlier sections with their initial neural

development. The associative areas of cortex are typically referred to as areas

responsible for multimodal integration—a junction for many types of sensory inputs.

These areas are likely to be sufficiently plastic, initially, to configure themselves to

receive a wide variety of inputs and to form different types of internal representations

(Rauschecker & Sejnowski, 1994). It is possible, therefore, that the pattern of responses

they generate initially, due to some cross-cortical input, are more or less random,

especially when the animal is at an early age. From this point, however, Hebb’s learning

rule can explain how related responses could begin to be passively “recognized” by the

neural system and used as a signal to build more structure between the neurons involved

in a response. With repeated exposure to the stimulus, these same neurons that responded

initially will continue to respond, and their activity will begin to be supported by

50

additional excitable input from members of a forming cell assembly. Eventually, the cell

assembly may become so well-formed that they respond more or less as a unit and

reverberate for some longer duration than they would be capable of isolated from each

other.

However, consider the scenario of a new stimulus now produced in the

environment which is very much similar to the previous stimulus for which an internal

representation has already been formed. How different does this stimulus “appear” to the

neural system? That is, how similar is the excitation for the neuronal members of the cell

assembly whose development was discussed above? It is quite likely that this new

stimulus is perhaps the old stimulus, but viewed from a different vantage point.

However, because we are dealing with association areas, it is reasonable to assume that

object feature detection is accomplished at lower levels of the cognitive system and

remains constant, unless the perspective has changed drastically (Amit, 1995). Even if

this is not the case, multiple representations of the same object is not an impossibility in

association areas, as long as it is not required that every possible configuration be

represented, because then this becomes an impossible storage-space problem. The

question being asked is related to the general problem of category formation, and how

this in turn relates to spatially-aware cognitive processing.

Suppose some cell assembly, CA, is well-formed and responds to a particular

stimulus A. Another novel stimulus B is introduced, and it shares many of the

characteristics of A, but it also has features that do not fit with the feature set of stimulus

A. It is likely that neurons that are not members of CA in addition to some subset of CA

will respond to B. Let these neurons external to CA plus the neuronal population that

51

respond to both stimuli be population CB. If this is indeed how cell assemblies develop,

then inhibition is a mechanism that could help (together with short-term connection

strength, see section 1.1.1) two distinct population responses to form from these two

sufficiently different stimuli. The neural members that respond to both can act as

members of both assemblies (Rauschecker, 1995). However, the other respective neural

members can be made distinct in the long-term through inhibitory connections.

Consider what would happen if one particular stimulus, perhaps stimulus A, was

presented and inhibitory connections did not exist. CA neurons would begin to activate,

and the neurons whose membership spanned the two cell populations would also be

excited, either directly by the stimulus or through its recurrent connections with the other

members of CA. Because CB now is also receiving internal excitatory input through this

shared population, it too would begin to activate, perhaps more slowly. The result of this

stimulus presentation is a more diffuse response than is appropriate.

If inhibitory connections were now installed in this system between the separate

non-overlapping members of both cell assemblies CA and CB, the representations could

function distinctly. The inhibition prevents both cell assemblies from activating in their

entirety and so definite perception is achieved. This definiteness is implemented with

efficiency too, because it does not require complete distinction between members of a

cell assembly.

The lexical representations that were modeled here are conceivably developed in

the fashion outlined above. The behavioral evidence suggests this to be the case in that a

structural relationship was discovered that was linked with interference in response times

during a task where these representations were being accessed (Hamburger &

52

Slowiaczek, 1996). The results presented here provide support for the idea that this

interference phenomenon results from inhibition between phonologically similar lexical

representations. It is possible that physical proximity in lexical cortical areas is an

artifact of the continual evolution of phonological and lexical representations in early

development (i.e., /ta/ begins to be differentiated from /da/). That is, early

representations for language sounds may split when a single categorical representation no

longer remains consistent with learning signals being received from the environment (a

realization through some teaching cue that there exists both a “toe” and a “doe”, for

example, and that they are separate objects). These representations may become more

discrete using the inhibitory technique described above. It should be emphasized that this

solution seems optimal with respect to representational space because the commonality of

features is retained in the sharing of neuronal members between representations. This is

highly speculatory however and would need to be tested empirically, but the recent

success in finding neurophysiological correlates of reveberatory modules (Amit, 1995)

suggests that methods for doing this are fast approaching.

Acknowledgements

I would like to thank Professor Eric Chown for his unyielding support during the

course of this research-- he truly was an inspirational mentor. Additionally many thanks

are extended to the members of the honors committee, Louisa Slowiaczek, Rick

Thompson and Eric Chown, who each read my thesis and returned great feedback.

Special thanks are also needed for Professor Louisa Slowiaczek for providing both

extremely interesting data to model as well as theoretical suggestions that aided research.

53

Finally, I would like to thank my parents for continuing to believe in me throughout the

past year even through its periods of frustration.

54

References

Abraham, W.C. & Goddard, G.V. (1983). Asymmetric relationships between

homosynaptic long-term potentiation and heterosynaptic long-term depression.

Nature 305, 717-19.

Amit, D. J. (1995). The Hebbian paradigm reintegrated: Local reverberations as internal

representations. Behavioral and Brain Sciences, 18(4), 617-657.

Brodie, S.E., Knight, B.W., & Ratliff, F. (1978). The response of the Limulus retina to

moving stimuli: a prediction by Fourier synthesis, Journal of General Physiology,

72, 129-166.

Calvin, W.H. (1995). Cortical columns, modules, and Hebbian cell assemblies. In The

Handbook of Brain Theory and Nueral Networks (ed. Arbib, M.A.), Cambridge

MA: MIT Press, pp. 269-272.

Celsis, P. Doyon, B., Boulanouar, K., Pastor, J., Démonet, J. & Nespoulous, J. (1999).

ERP correlates of phoneme perception in adults. NeuroReport, 8, 919-924.

Chown, E. (1994). Consolidation and Learning: A Connectionist Model of Human Credit

Assignment. Doctoral dissertation. The University of Michigan.

Feldman, J.A. & Ballard, D. H. (1982). Connectionist models and their properties.

Cognitive Science, 19, 1-52.

Frauenfelder, U. H. & Tyler, L. K. (1987). The process of spoken word recognition: An

introduction. Cognition, 25, 1-20.

Hamburger, M. & Slowiaczek, L. M. (1996). Phonological priming reflects lexical

competition. Psychonomic Bulletin & Review, 3(4), 520-525.

Hebb, D. O. (1949). The Organization of Behavior. John Wiley.

55

Hetherington, P. A. & Shapiro, M. L. (1993). Simulating Hebb cell assemblies: the

necessity for partitioned dendritic trees and a post-not-pre LTD rule. Network, 4:

135-153.

Kaplan, S., Sonntag, M. & Chown, E. (1991) Tracing recurrent activity in cognitive

elements (TRACE): A model of temporal dynamics in a cell assembly.

Connection Science, 3, 179-206.

Kaplan, S., Weaver, M. & French, R. (1990). Active symbols and internal models:

Towards a cognitive connectionism. AI & Society, 4:51-71.

Kleinsmith, L. J., & Kaplan, S. (1963). Paired-associate learning as a function of arousal

and interpolated interval. Journal of Experimental Psychology, 65, 190-193.

Kinsbourne, M. (1982). Hemispheric specialization and the growth of human

understanding. American Psychologist 37(4), 411-420.

Liéguois-Chauvel, de Graff, J.B., Laguitton, V., & Chauvel, P. (1999). Specialization of

left auditory cortex for speech perception in man depends on temporal coding.

Cerebral Cortex, 9, 484-496.

Magleby, K. L. (1987). Short-Term changes in synaptic efficacy. in Synaptic Function,

New York, NY: John Wiley & Sons.

McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986). Parallel Distributed

Processing Explorations in the Microstructure of Cognition, Cambridge, MA:

The MIT Press.

Miller, R. R., & Marlin, N. A. (1984). The physiology and semantics of consolidation. In

H. Weingartner, & E. S. Parker (Ed.), Memory consolidation: Psychobiology of

cognition Hillsdale, NJ: Lawrence Erlbaum.

56

Milner, P. M. (1957). The cell assembly: Mark II. Psychological Review, 64: 242-52.

Miyashita, Y. (1988). Neuronal correlate of visual associative long-term memory in the

primate temporal cortex. Nature, 335: 817-20.

Miyashita, Y. & Chang, H.S. (1988). Neuronal correlate of pictorial short-term memory

in the primate temporal cortex. Nature 331:68-70.

Minksy, M. & Papert, S. (1969). Perceptrons. Cambridge, MA: The MIT Press.

Muller, D., Joly, M., and Lynch, G. (1988). Contributions of quisqualate and NMDA

receptors to the induction and expression of LTP. Science, 242: 1694-1697.

Murdock, B. B., Jr. (1962). The serial position effect of free recall. Journal of

Experimental Psychology, 62, 482-488.

Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard.

Pisoni, D.B. & Luce, P.A. (1987). Acoustic-phonetic representations in word

recognition. Cognition, 25: 21-52

Rauschecker, J. P. (1995). Compensatory plasticity and sensory substitution in the

cerebral cortex. Trends in Neurosciences 18: 36-43.

Rauschecker, J. P. & Sejnowski, T. (1994). Processing of visual and auditory space and

its modification by experience. In: Advances in Neural Information Processing

Systems, vol. 6, ed J. D. Cowan, G. Tesauro & J. Alspector.

Rochester, N., Holland, J. H., Haibt, L. H., & Duda, W. L. (1956). Tests on a cell

assembly theory of the action of the brain, using a large digital computer. IRE

Transactions on Information Theory IT-2: 80-93.

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and

organization in the brain. Psychological Review 65:386-408.

57

Sakai, K. & Miyashita, Y. (1991) Neural organization for the long-term memory of

paired associates. Nature 354:152-55.

Slowiaczek, L. M. (1994). Semantic priming in a single-word shadowing task.

American Journal of Psychology, 107(2), 245-260.

Slowiaczek, L. M. & Hamburger, M. (1992). Prelexical facilitation and lexical

interference in auditory word recognition. Journal of Experimental Psychology:

Learning, Memory and Cognition, 18(6), 1239-1250.

Sonntag, M.L. (1991). Learning sequence in an associative network: A step towards

cognitive structure. Doctoral dissertation. The University of Michigan.

Steinschneider, M., Schroeder, C.E., Arezzo, J.C., & Vaughan, Jr., H.G. (1995).

Physiologic corelates of the voice onset time boundary in primary auditory cortex

(A1) of the awake monkey: temporal response patterns. Brain and Language, 48:

326-340.

White, G., Levy W. B., & Stewart, O. (1990). Spatial overlap between populations of

synapses determines the extend of their associative interaction during the

induction of long-term potentiation and depression. J. Neurophysiol. 64: 1186-98.

Zatorre, R.J., Meyer, E., Gjedde, A.L. & Evans, A.C. (1996). PET studies of phonetic

processing of speech: review, replication, and reanalysis. Cerebral Cortex, 6, 21-

30.

A Connectionist Model of Competition at the Phonological ...

Documents

Transcript of A Connectionist Model of Competition at the Phonological ...