A Connectionist Model of Competition at the Phonological ...
Transcript of A Connectionist Model of Competition at the Phonological ...
A Connectionist Model of Competition at the Phonological-lexical
Interface during Speech Perception
An honors project for the Departments of Computer Science and Psychology
By Eric M. Forbell
Bowdoin College, 2000
© 2000 Eric Forbell
ii
Table of Contents
Abstract 1
1. Introduction 2
1.1 An overview of the TRACE and multiTRACE models 4
1.1.1 The original TRACE model 7
1.1.2 MultiTRACE 11
1.2 Speech perception 13
1.2.1 Lexical priming 14
2. Speech Simulation 15
2.0.1 Phonemic layer 17
2.0.2 Transitional layer 19
2.0.3 Lexical layer 21
2.1 Lateral inhibition and lexical competition 22
2.1.1 Lexicon organization 25
2.2 Sampling lexical networks 26
2.2.1 “Subject” networks 28
3. Simulating the lexical priming experiments 30
3.1 Experiment 1 30
3.1.1 Design 30
3.1.2 Results and discussion 32
3.2 Experiment 2 35
3.2.1 Design 35
3.2.2 Results and discussion 35
4. Other modeling issues 39
4.1 Speech speed 39
4.2 Sequence bias 41
4.3 Error rate and lateral inhibition 43
4.4 Word representation 45
4.5 Typicality effect 46
5. Conclusion 47
Acknowledgements 52
References 54
1
Abstract
A connectionist architecture comprised of Hebbian cell assemblies was developed
and applied to the problem of speech recognition at the phonemic-lexical interface.
Speech was encoded in the model as the sequential activation of phoneme representations
connected to higher-level linguistic structures. An architectural decision concerning the
spatial organization of the top-level lexical map was supported by psycholinguistic
evidence suggesting a topographical layout in which cognitive distance was related to
initial phonological structure. Through computer simulation, lateral inhibition at this
lexical level was shown to be a necessary and sufficient mechanism to replicate the
findings of a series of lexical priming experiments. The results of these experiments
provided added support for a more general theory of cognition based upon Hebb’s
original cell assembly hypothesis.
2
1. Introduction
Studying the brain and its emergent high-level cognitive functions is currently
being approached from many different perspectives. Until recently, cognition was
interesting only to psychologists and philosophers, but that group has since been
expanded to include various other, seemingly unrelated fields like computer science and
mathematics. Many characteristics of the current approaches to cognition are due to the
contributions from emerging researchers in these fields.
One such approach, called connectionism, was first introduced in the late 1950’s
by a psychologist named Frank Rosenblatt because of several computational-related
discoveries in theoretical computer science. He specified the design of a computational
model called a Perceptron using simple computing elements connected in parallel fashion
that attempted to mimic the organization of the brain that had been found in early visual
processing areas (Rosenblatt, 1958). His specification involved clustering units into
layers that impinge upon higher and higher layers, though he was only able to
successfully train networks of two levels. The types of pattern classifications that could
be solved using this architecture only include those that are linearly-separable, a fact later
proven by two computer scientists, Minsky and Papert (1969). The logic function
exclusive-or (XOR) corresponds to a classification that cannot be represented by this
architecture, for example, because resulting category members are linearly inseparable.
Minksy and Papert’s (1969) book, Perceptrons, was perhaps solely responsible for the
lack of connectionist research for the next twenty years because it stressed the incredibly
difficult learning problems associated with multi-layer networks.
3
While the connectionist approach was largely ignored, an alternative approach to
cognition, and intelligence in general, formed, and it was grounded in a commitment to
producing practical results. Artificial Intelligence research in the 60’s and 70’s
abandoned simulating the lower-level structure of the brain and focused on modeling
high-level cognitive concepts known as symbols. During this era humans were compared
to computers in that they are both types of information processors: they receive stimuli
from the environment, perform some kind of internal processing and then produce a
result. Computers were easily programmed to manipulate symbols and so it was assumed
that humans must compute in the same way.
It was not long before the limitations of symbol processing —the manipulation of
symbols according to the principles of second-order logic—started to weaken the
approach of the classical artificial intelligence researcher. Certain problems, such as
those involving pattern recognition and classification, tasks that humans excel at, were
simply not possible using this symbol framework because they were not well-grounded in
a representation of the environmental stimulus or they became intractable due to
complexity issues. Nevertheless, these emerging limitations and the rekindling of
research with connectionist networks changed the course of research in the early 1980’s.
The possibilities of connectionism were reinvestigated at this time, but it was
thought that an extension to the information-processing model using a massively parallel
architecture could be useful (Feldman & Ballard, 1982; McClelland, Rumelhart, &
Hinton, 1986). The number of time steps a typical artificial intelligence program used to
solve a problem at that time was much too large to be biologically feasible, so a shift to a
connectionist framework seemed inevitable. The main issues with connectionist models
4
at the time were network stability, dealing with noise, forming representations of
sequences, and representing high-level cognitive concepts (Feldman & Ballard, 1982).
Despite the progress to date, these issues still remain to be resolved.
The connectionist modeling approach used here stems from a tradition stressing
the importance of cyclical or recurrent networks as the basic unit of cognition. This
recurrent circuit is primarily inspired from Hebb’s construct of cell assemblies (1949).
The basic unit of this connectionist model, called TRACE (Tracing Recurrent Activity in
Cognitive Elements) is actually part of a larger, more complete cognitive architecture
(Kaplan, Chown, & Sonntag, 1991). A strength of a modeling approach committed to
providing holistic explanations of cognitive function is that of confidence: any
component added to the architecture cannot, without reasonable evidence, require a
mechanistic change in an already existing component.
The history of cell assembly theory will now be discussed and will be followed by
descriptions of the two models from which the current model evolved. In the section
following, a phenomenon of speech perception will be introduced that provides rich
constraints for neural models of cognition. The body of data that was modeled—whereby
differences in response times varied with word-form relationships in a priming
paradigm—offered support for specific neural architectures and mechanisms that extend
the applicability of TRACE.
1.1 An overview of the TRACE and multiTRACE models
Hebb, with limited neurophysiological knowledge at his disposal, claimed that
some structure in the nervous system must exist so that stimuli from the environment can
persist long after the stimuli are no longer present. During development, individual
5
neurons, later to comprise a cell assembly, are grouped together based upon similarities
in the temporal nature of their firings. That is, the connection between one cell and
another is strengthened when both cells are firing simultaneously:
"Whenever an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased." (Hebb, 1949, p. 62)
At the basic level, this simultaneous firing pattern indicates a similarity in responding to
some particular part of the stimulus, and so the two cells combined form a more complete
and robust description of that input.
Mechanisms by which connections are strengthened between two cells were not
known during Hebb’s time, but his intuition for the requirement of some sort of synaptic
change was enlightening. Although his proposed “learning rule” is in some way or
another part of every learning algorithm since, Hebb fell short in explaining the
significance that inhibitory factors may have on the network’s operation. The rampant
spread of activity that would result without inhibition or some sort of neural fatigue was
witnessed by early connectionist researchers of the 1950’s that tried to simulate Hebb’s
theories using a digital computer (Rochester et al., 1956). These researchers, through
simulation, were able to claim that Hebb’s theory alone is not sufficient for the
formulation of cell assemblies and that additional mechanisms such as inhibition were
necessary (Milner, 1957).
The mechanism for synaptic strengthening seems to start with a post-synaptic
event known as long-term potentiation (LTP), as evidenced in the hippocampus of the rat
(Muller, Joly & Lynch, 1988; Tocco et al., 1992). In addition to synaptic strengthening, a
6
reverse process has also been found to occur, long-term depression (LTD), whereby
connection strength decreases when the post-synaptic cell is active while the pre-synaptic
cell is inactive, and not the other way around (White et al. 1990; Abraham and Goddard,
1983). Hetherington and Shapiro (1993) have strengthened this post-not-pre LTD
mechanism through computer simulation of cell assemblies that are unique, persistent and
reliable in their activation as a result of some stimulus.
This recent research on mechanisms of synaptic modification and its agreement
with Hebb’s original hypothesis to a certain extent, has prompted further investigation of
the cell assembly construct. Milner’s inhibitory component was one such advance to
Hebb’s theory and more recent theories now include a component of neural fatigue and
short term connection strength (STCS) (Kaplan, Sonntag & Chown, 1991). This recent
model, called TRACE, attempts to dynamically simulate the functioning of a population
of neurons comprising a cell assembly through a system of difference equations. It does
not, however, model the formation of an assembly through the strengthening and
weakening of internal connections, and assumes the cell assembly to be well-trained.
This is a short-coming in the model, but the reduced complexity makes possible an
analysis of how assemblies may interact with one another when TRACE units are
networked. Models that learn assemblies at the lowest level have failed to achieve this
level of analysis.
Because the underlying structure of an assembly is not taken into account by this
model, this may seem no different than the classic artificial intelligence symbol systems
discussed earlier. However, the TRACE model was developed with biological constraints
in mind because it is intended to model human cognitive function, not merely intelligent
7
behavior. That is, the behavior of the unit both temporally and mechanistically is based
on psychological and neurophysiological data. The issue of constraining a cognitive
model is of prime importance these days, and so it must not be overlooked (ch. 3 in
Newell, 1990). Neuroscience provides bottom-up constraints, such as the fact that the
speed of neural firing is known and the timing of neural circuits can be interpolated. On
the other hand, cognitive psychology provides experimental evidence indicating the
response time in which basic deliberate cognitive acts (i.e., recognition, choice selection)
occur. These are top-level constraint on any system intending to model, and therefore
explain, basic cognitive behavior.
An introduction to the models themselves will now proceed from the simple
TRACE unit to the definition of a MultiTRACE network. The main components of each
model will be discussed to the point where further extensions to the model should be
clear, but a more formal treatment of the model specifications can be found elsewhere
(Kaplan, Sonntag & Chown, 1991; Sonntag, 1991; Chown, 1994).
1.1.1 The original TRACE model
TRACE is a mathematical specification of the functional dynamics of a single cell
assembly. The most important and complicated component of the model is perseveration
(P), or activity, as it is most often referred to in a connectionist system. Activity
represents the combined activity of all the neural elements that comprise the cell
assembly. Unlike individual neurons, the activity of a cell assembly is determined by the
complicated feedback connections between units and so has the ability to reverberate for
quite a long time. This reverberation period has two distinct phases: perception and
8
primary memory (Fig. 1.1). Activity is dependent upon two main factors which correlate
to the terms seen in the delta equation of Table 1.1. The value of activity can be thought
Time step
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8
10 30 50 70 90
200
400
P
F
S
perception
primary memory
Figure 1.1: A plot of activity (perseveration), short-term connection strength and fatigue over a period of about 4 seconds (t = 400) using the original TRACE equations and parameters (Kaplan, Sonntag & Chown, 1991).
of as the percentage of internal units firing at that point in time. Therefore, the value 1 – P
represents the percentage of units still capable of firing to increase the level of activation
of the assembly as a whole. The first term in the delta equation for P is the rate of growth
of activity and is determined by how sensitive (V) the assembly is to activation due to
fatigue, long-term and short-term connection strength and how many of the internal
members have not yet fired (1 – P):
VPIPPPrise ∗−∗−+=∆ )1())1((
9
Alternatively, the term expressing the decline of activity is sensitive to neurons
forming an assembly dropping out due to fatigue (θl) as well as a competitive inhibition
(θc) from other cell assemblies (Milner, 1957):
)1)()1(( VPPPP cldecline −−+=∆ θθ
This inhibitory factor was approximated in the original TRACE model because it
is a model of a single cell assembly. MultiTRACE models can explicitly define this
degree of inhibition, and this will become important for the implementation of lateral
inhibition.
The sensitivity (V) of a cell assembly to fire is determined, as noted earlier, by the
strength of internal connections (LTCS), short-term connection strength, and fatigue:
vFSLV )( +=
Short-term connection strength and fatigue both have update equations that are similar to
activity, that is, they have both growth and decay terms:
SSPS
FFPF
dg
dg
σσ
φφ
−−=∆
−−=∆2
2
)1(
)1(
The behavior of these two components can be seen in comparison to activity in Fig. 1.1.
Fatigue will become much more important to the model when multiple TRACE units are
embedded in a network. For now, it suffices to know that fatigue is the main factor in
stopping the rise of activation. Short-term connection strength is a factor that increases
the ability of neuron to activate another neuron when it initially begins to fire and it is
10
substantiated by a set of experiments performed in the 1960’s and by recent
neurophysiological data (Kleinsmith and Kaplan, 1963; Magleby, 1987).
Table 1.1: The system of equations that describe the functional dynamics of a basic TRACE unit. A time step, t, corresponds to approximately 10 ms (Kaplan, Sonntag, & Chown, 1991).
><<=
∆+=+∆+=+∆+=+∆+=+
δδα
tttI
LtLtLStStSFtFtFPtPtP
when0.00 when)(
)()()1()()()1(
)()1()()1(
Equations Update
0.0
)()()(
2
2
=∆
−=∆
−=∆
+=
+−+=∆
LSSPS
FFPFv
FSLV
VPPPVPIPPP
dg
dg
cl
σσ
φφ
θθ
EquationsDelta
Variables P(t) : perseveration (activity) F(t) : fatigue S(t) : short term connection strength L(t) : long term connection strength I(t) : external input
)-(1quantity thedenotes XX
Parameters θl : unit loss θc : inhibitory competition v : normalization factor φg : fatigue growth φd : fatigue decline σg : STCS growth σd : STCS decline α : input amplitude δ : input duration
Because a TRACE unit represents one well-learned cell assembly, it does not
model the alterations of synaptic weights between internal neurons. Therefore, internal
LTCS does not change at all during the simulation:
0.0=∆L
Lastly, a cell assembly is activated by an general input, I. This input is a stimulus
that the cell assembly has theoretically represented through the feedback connections in
11
between the member neurons. The input, therefore, has a magnitude which is the value
of I and a duration that tells the simulator how long to present the stimulus (Table 1.1).
1.1.2 MultiTRACE
The simplification requirements to successfully model a single cell assembly no
longer apply when TRACE units are set in a network environment and connected with
one another. The interactions of units can result from more “connectionist-like”
mechanisms, instead of being approximated by a mathematical model. Model
components that are more improved because of a network environment include input,
inter-unit connectivity (LTCS), STCS, and fatigue. Each of these improvements will be
discussed to an introductory level so that the further improvements made in this study can
be understood in the framework of MultiTRACE. It will also be noted that all
MultiTRACE equations are derivations of Chown’s implementation which was derived
from the original specification by Sonntag (Chown, 1994; Sonntag, 1991).
First, the input function of a TRACE unit embedded in a network begins to
approach that of a typical connectionist simulation. That is, input is divided into both
excitatory and inhibitory types. Because units are interconnected with other units in
MultiTRACE, both the inhibitory and excitatory input for a given unit is determined by
the classic connectionist summation method (note activity/perseveration is now denoted
as A, to conform to connectionist standards):
12
2.0) set to (currentlyfactor inhibition lateral L0.5) set to (currentlyfactor inhibition global G
layer ain units ofnumber Ninputs inhibitory and excitatory ofnumber mk,
)()(
)() ()(
)()(1
)(
1
===
=
=
+=
+
=
=
−=
∑∑
∑
tAwtinput
tAswtinput
tAGtinputL
I
tinputIIII
kjkinhm
kjkjkexck
N
im
inhm
inh
k
exck
exc
inhexcnet
σ
The implication of these changes to input is that now units can be affected and activated
by other units. Psychologically, connections between two units refers to an association
between two active symbols. The behavior of these units allow for a more complex
interaction between “symbols” than a mere association, however, and this lies at the
essence of the multiTRACE model in explaining psychological phenomena (Kaplan,
Weaver & French, 1990).
Connectivity between units, denoted by wjk in the traditional manner represents
the strength with which unit j excites unit k. The unidirectional nature of the connection
stems from the fact that typical neurons are one-way devices. Units also have a short-
term effect (STCS) on other units in a network. Short-term connection strength (sjk)
between units in multiTRACE is considered to be a separate mechanism from STCS in a
TRACE unit. STCS in TRACE was explained at the level of a neuron as being a
temporary facilitation to fire one neuron at the onset of rapid firing of the pre-synaptic
cell (Magleby, 1987). It will not be useful to delve into the differences of STCS at the
inter-unit level, because this factor is more important for inter-unit learning which is not
13
modeled here. Further information about STCS in a multiTRACE environment can be
found elsewhere (Chown, 1994).
Inhibitory input is computed differently than excitatory input in this version of
multiTRACE because it is a two-dimensional component whereas excitatory input is one-
dimensional. Total inhibition imposed onto a cell assembly in this model can be
decomposed into regional inhibition and local (or lateral) inhibition. Regional inhibition
is a mechanism whereby the spread of activity is controlled and so acts as a negative
feedback system. Regional inhibition is approximated in this model by summing the total
network (or layer) activity and returning this value adjusted by a constant.
Separately, local inhibition on a cell assembly comes from other assemblies in the
region. Inhibitory connections separate from excitatory connections are maintained in the
current implementation of the model and real number weights are used to indicate the
strength of inhibition from the source to a target. Like that of regional inhibition, the
summed total of local inhibition is adjusted by a multiplicative factor.
1.2 Speech perception
A cognitive model gains support when it can be implemented and applied to some
psychological domain. Additionally, a model can be improved simultaneously because
an application will provide additional constraints that in many cases may clarify
ambiguities or suggest possible new mechanisms. This project was concerned with
modeling a particular aspect of speech perception known as lexical contact. Lexical
contact is defined as the phase whereby the representations activated by the speech input
make initial contact with the lexicon (Frauenfelder & Tyler, 1987). This process, and
14
speech in general presents a difficult challenge to cognitive models because the task is
highly temporal in nature. Dealing with sequential stimuli has been a weakness of classic
connectionist models.
1.2.1 Lexical priming
Specifically, the cognitive model described in this introductory section will be
used to simulate and hopefully replicate the results of a particular set of experiments from
the field of psycholinguistics. The data from Slowiaczek and Hamburger suggest that
phonologically similar words compete at the lexical (word) level of speech recognition
(1996). Specifically, this competition was observed in a priming paradigm in which the
primes were phonologically related to target words by the number of initial phonemes.
These data provide both architectural and mechanistic constraints on phonological and
lexical processing, and so are crucial to validating this connectionist model. The early
part of the next section will present a thorough overview of this connectionist architecture
and the method by which lexical contact will be simulated using multiTRACE.
15
2. Speech Simulation
The primary goal for this project was to simulate, as accurately as possible, the
phase of speech perception called lexical contact. The neural model at the heart of the
simulation is motivated by behavioral evidence that constrains its design. One such
constraint is that of hierarchy. Using spoken-word priming experiments, Slowiaczek and
Hamburger (1992) observed two dissociated effects, termed phonological facilitation and
lexical interference, that suggested a connectionist architecture whose lexical and
phonological representations are separate. Facilitation, or a decrease in response time,
was observed during low word-similarity conditions. Alternatively, response times
increased when word-similarity increased to 3-phonemes of overlap. These two results
provided suggestions for different relationships in the hierarchy. In hierarchical systems,
vertical refers to between-level relationships while horizontal corresponds to
relationships between members of the same level. The first effect, phonological
facilitation, supports a design whereby phonological representations excite separate
lexical representations in a vertical manner. On the other hand, the interference effect
suggests that words with similar initial phonological structure inhibit one another
horizontally. With these constraints in mind, a connectionist simulation was developed
that was capable of mimicking the experiments performed by these researchers
(Slowiaczek & Hamburger, 1992; Hamburger & Slowiaczek, 1996).
Several simplifications in the simulation design were necessary in order to focus
on lexical contact. The input stream, which in actuality consists of compressions of air
impinging on the peripheral auditory system, were modeled—as is typically done in the
field—as an incoming stream of phonemes. The assumption for a linear input of discrete
16
and invariable units is a large one, but was necessary in order to abstract to the level of
analysis used here.
The output of the simulation, given some serial input of phonemes, is a
recognized word. Therefore, this connectionist model is simply a mapping, albeit
complex, between a series of small, temporally-distinct sound units and a unit word
representation. Furthermore, this mapping is encoded in the activity of phoneme units
excited by external input that in turn cause the excitation and subsequent activation of
lexical (word) units:
Speech ���� phoneme unit activity ���� lexical unit activity ���� Word
Given this scheme, there is no reason to require that the units representing
phonemic information directly cause the activation of lexical units. That is, phoneme
unit activity in this model is essentially only a neural implementation of the external
physical stimulus and does not provide additional information to the recognition process.
Additional layers can be inserted into this model without restraint and still the basic
transformative process is retained, whereby phoneme unit activity maps to lexical unit
activity, but perhaps indirectly:
Speech ���� phoneme activity ���� intermediate activity ���� lexical activity ���� Word
In actuality, a lot of the human cortex is organized similar to this mapping point
of view. The mammalian cerebral cortex is organized into six layers, and it is typically
the case that the same layers act as input layers (IV) and others as output layers (V – VI)
throughout the brain. Additionally, unlike the inner layers, the outermost layers (I-III)
usually do not receive or send out any long-distance neural processes but instead act more
locally as areas of intermediate processing, receiving information from input layers and
17
sending projections to output layers (Calvin, 1995). Although this is highly simplified,
this rough approximation of cerebral organization suggests a hierarchical theme for
cerebral processing, one that is directly applicable to the perceptual process under
examination.
The layered connectionist architecture that was used to model lexical contact here
was motivated by the psycholinguistic evidence (Slowiaczek & Hamburger, 1992).
However, an additional layer was inserted into the hierarchy because a 2-tiered network
failed to achieve successful recognition capabilities. However, it was noted above that
this does not necessarily conflict with the supporting evidence, because phonological
representations still map, albeit indirectly, to lexical representations. Therefore, the final
architectural design consisted of three distinct layers—phonemic, transitional and
lexical—which will be discussed in the next three sections.
2.0.1 Phonemic layer
The phonemic representation of the speech input is merely an arbitrary
association of short speech sounds with cognitive representations. Therefore, the
phonemic layer in the network consists of cell assembly units that represent, in an
abstract sense, the phonemes of a language, which in this case closely mirror that of
English. Table 2.1 shows the phonemes that were accounted for in the current model but
it should be noted that completeness and accuracy of this list is not important, because
ideally linguistic processing should function regardless of actual language specifics.
Phoneme categories were used in the definition of the regular language (a series of rules
used to build the word maps that is described later) used to generate “legal” words in the
lexical layer. It should also be noted that every cell assembly modeled in this particular
18
system is identical with respect to its parameters, and the crucial ones are listed in Table
2.2. These values were determined experimentally to achieve cell assembly units with a
time-course of activity appropriate for speech perception (high-refresh and fast-acting).
Table 2.1: List of phonemes represented in the modeled language network. Phoneme category Represented phonemes
Vowels /ue/, /oo/, /oh/, /uh/, /eh/, /ah/, /ay/, /iy/, /ee/, /oy/
Stop consonants /b/, /p/, /d/, /t/, /g/, /k/
Fricatives /v/, /f/, /th/, /sh/
Nasals /m/, /n/, /nk/, /nd/, /ng/
Glides /l/, /r/
Table 2.2: Layer parameters Parameter Value
Fatigue growth 0.15
Fatigue decline 0.04
STCS growth 1.0
STCS decline 0.2 * STCS: Short-term connection strength
Input to the phonemic layer is completely external, simulating the fact that a more
primary auditory system not being modeled is providing the innervation (Fig. 2.1). There
is good neurophysiological evidence that this is indeed the case, as the primary auditory
cortex has been found to process complex temporal events in the sound stimulus —
including linguistic sounds— in both monkeys (Steinschneider, Schroeder, Arezzo, &
Vaughan, 1995) and in humans (Liégeois-Chauvel, Laguitton, & Chauvel, 1999). Both
neural imaging experiments (Zatorre, Meyer, Gjedde, & Evans, 1996; Binder, Frost,
Hammeke, Cox, Rao, & Prieto, 1997) and event-related potential (ERP) studies (Celsis,
Doyon, Boulanouar, Pastor, Démonet, & Nespoulous, 1999) also suggest that
19
phonological representations formed in the left secondary auditory cortex (left
temporoparietal regions) are activated as a result of this initial temporal processing.
Therefore, when speech is being received in the current model, the corresponding
phoneme representation is presented with a square input wave (amplitude = 1.0, duration
= 100 ms) that causes an activation period of about 310 ms. There is no horizontal
connectivity in this layer, and all outgoing connections map onto the transitional layer
which is described next.
Figure 2.1: Input to the phoneme layer is completely externalized here in that it is not directly modeled. Instead, it is assumed that these phonological representations lie in the secondary auditory cortex (or an association area) and receive processed auditory information from the primary auditory cortex.
2.0.2 Transitional layer
When the need for an intermediate layer between the word and phoneme units
arose for reasons discussed later, a decision was required as to what would be represented
in this layer. Because speech is a sequential stimulus and order is relevant, a speech
parser should be very sensitive to this characteristic. This suggests that it may be useful
for an intermediate level of processing to help resolve this temporal order.
Phonetic invariance, the property that a given phonetic sound has acoustical
properties that remain consistent across all its instances in spoken language, is not a
characteristic of human languages (Pisoni & Luce, 1987). For example, the acoustic
signature of a consonant preceded by one vowel may be different from its signature when
preceded by a different vowel. Additionally, the spectral characteristics of this consonant
when it comes before a vowel may look even more different. Therefore, it is quite likely
Peripheral auditory system Primary auditory cortex Secondary auditory cortex (phonemic layer)
20
that humans use this “transitional” information between consonants and vowels to help
determine relative order. This effect was demonstrated effectively by Jenkins, Strange
and Erdman (1983) by showing that the recognition of a vowel in a perturbed stimulus
was most successful when the stimulus retained the consonant-vowel transitional
information and omitted the center of the vowel and the consonants.
Because the acoustic signature of a syllable and its reverse form are different, it is
highly likely that they have separate representations in the cognitive system. For
example, there would be a distinction made between /eb/ and /beh/, if in fact there are
different neural representations for both. This cognitive distinction, when added to the
model provided a marked improvement in word recognition during early testing. The
improved performance does not come without a cost, however. In the artificial neural
network used here, for instance, the representation of all possible legal biphones (legality
of a word is defined in the next section) adds 380 units to a system containing 27
phoneme units.
The only difference between the two complementary forms of a transitional or
biphone unit (i.e., /bi/ and /ib/) is the input that it receives from the phoneme layer.
Because there is not sufficient evidence to bias either form over its complement, each
connection was made equal in strength. Likewise, lateral inhibition was not applied
either, because there is no evidence to support it at this level. Future research on this
topic might involve implementing this transitional layer with inhibition, however. When
inhibition is not present and the size of the input increases, competition at the transitional
level increases, resulting in even greater competition (due to more word representations
being activated) at the lexical level that may not be resolved correctly. This will be
21
evidenced by experiments discussed later in which error rates in performance have room
for improvement.
2.0.3 Lexical layer
The final layer in the system is, of course, the word or lexical layer. This layer
receives all of its input from the transitional units below and is built by an algorithm that
generates random words from an arbitrarily predefined language that resembles a subset
of the English language (but which also contains words that do not exist in English). All
words are monosyllabic and contain a maximum of five phonemes. This restriction was
made because it is hypothesized that multi-syllabic words add additional complexity to
the recognition process in that they most likely require additional levels of processing.
The strength of a transitional�lexical connection, in this model, is a function of
the serial position of the transition (or biphone) in the word. This aids the simulated
cognitive system in biasing the order of transitions, thereby differentiating between words
that share common transitional units, and is a product of the learning process. Primacy is
a strong property of sequence learning, so using it as a rule for assigning weights seems
plausible. Recency effects of learning sequences are not applicable here, however,
because recency effects are traditionally associated with short-term memory processes
that decay over time, and so do not relate to the long-term connection strength properties
being discussed here (Murdock, 1962).
A rule for building connection weights between the transitional and lexical units
was implemented based upon this theory. The weight of a connection from a transition to
a lexical assembly decays linearly by a constant factor of 1/10 with increasing position in
22
the word. For the longest words in the modeled lexicon (5 phonemes/4 transitions), the
last transition in the word sequence has ~40% the strength of the primary position.
2.1 Lateral inhibition and lexical competition
With the general framework for the connectionist model described, the crucial
component and main hypothesis of this research will now be examined. The usefulness
of hierarchy in a cognitive symbol system is that as one proceeds upwards in the
hierarchy, elements of lower levels can be substituted by singular elements. This
substitution, which is similar to the process of “chunking” in learning serial lists, can
facilitate cognitive processing. For example, modality-specific information can be
transformed, through substitution, from information specific to that modality to a general
and common framework of “concepts”. This relates to the general theory of associative
cortex as an area where multi-modal integration occurs.
The flexibility gained through cognitive hierarchies also introduces problems of
control. Specifically, when discussing networks of cell assemblies or populations of
neurons, the control issue lies in restraining the spread of activity. The pattern of
connectivity in neural systems from lower levels to higher levels of a hierarchy is not
necessarily restricted to a many-to-one mapping, but it is instead a many-to-many
relationship. In this more complicated type of mapping, a true substitutive effect is not
met with regards to connectivity (Fig 2.2). An additional mechanism is therefore
necessary to produce “winner-take-all” behavior (for a good description of winner-take-
all connectionist networks, see Feldman & Ballard, 1982).
23
Figure 2.2: In (1) above, elements {a,b,c} and {d} are fully substituted by elements E and F, respectively. It can be seen that all of the elements in the left space are “represented” in the right space and these representations do not overlap. However, in (2), which is a many-to-many mapping, the elements {a,b,c,d} are not represented without overlap as both b and c map to both E and F.
Peripheral visual areas of most animals include a neural mechanism that is useful
to achieve this substitutive goal. Lateral inhibition is an organizational and chemical
mechanism, first observed in the simple eye of the Liminus (i.e., Brodie, Knight &
Ratliff, 1978), capable of increasing the contrast level of an incoming visual stimulus.
Some retinal ganglion cells have receptive fields (essentially, the group of retinal
receptors in a lower level that project to this cell) such that the ring around the center
provides inhibitory input and the center area, excitatory input (Fig. 2.3). Therefore, the
cell is most excited when acute differences in light occur in the stimulus, and not diffuse
patterns. By accentuating these differences in the light stimulus, the cell provides a kind
of filter of the important edge data relevant to interpreting a visual scene.
24
Figure 2.3: (1) Center-on, off-surround retinal ganglion cell. The cell is most excited when only the center of its receptive field is receiving a light stimulus. The area surrounding this center provides inhibitory input when lit. (2) Shaded portions of the cell refers to no light in the correlated part of the cell’s receptive field. (3) Shows what the cell’s activity (firing frequency) would look like given the regions being shaded directly above the datapoint.
The importance of inhibition in the neural system is demonstrated by the
incredible success of animal visual systems. Although the lateral inhibition described
above refers to a level-to-level inhibitive mechanism, the same general technique can also
be applied to a same-level paradigm, where adjacent regions of some cognitive space
inhibit its surrounding neighboring regions. The effect of this kind of organization is a
focusing of activity, exactly what is needed to approach a winner-take-all type of
behavior.
25
2.1.1 Lexicon organization
Psycholinguistics has produced solid evidence that suggests the lexicon to be
organized based upon word structure. One such line of research in the domain of speech
perception uses a task called single-word shadowing to help determine the structural
relationships of words in lexical memory. The stimuli for a single-word shadowing task
consists of two words, a prime followed by a target which are both spoken aloud. The
participant’s task is to simply repeat the target word aloud as quickly and as accurately as
possible. Slowiaczek (1994) validated this technique as a means of measuring the access
to lexical memory as opposed to simply accessing an acoustic buffer, by showing that the
recognition of a target was facilitated by a semantically related prime. Utilizing this
technique, two other studies were able to shed light on the organization of the lexicon by
manipulating phonological relationships between targets and primes, and observing how
these relationships affected response time (Hamburger & Slowiaczek, 1996; Slowiaczek
& Hamburger, 1992). The basic experimental design consisted of target words paired up
with primes that varied in the amount of initial phonological overlap (i.e., 2-phoneme
overlap: blast and block). An effect that was consistent across both studies was that the
response time in the 3-phoneme overlap condition (high similarity) was significantly
slower than that of the low similarity conditions (1- and 2-phoneme overlap), but not
necessarily slower than the 0-phoneme overlap condition.
Slowiaczek and Hamburger (1992) proposed a connectionist model consisting of
a prelexical and a lexical level whereby inhibitory connections between phonologically
similar words were strong. This intuitive architectural design would explain the
interference phenomenon observed in the high similarity condition. In the high similarity
26
condition, the effect of the inhibition is maladaptive because recognition is slower.
However, lateral inhibition at the lexical level should be effective in general by resolving
the competition between words being activated by the speech input. In this paradigm,
lateral inhibition acts as an automatic gain control, where gain in this environment can be
defined as the ratio between activity and incoming input. By suppressing its neighbors
which are providing inhibitory input, a unit can automatically increase its gain if it has
even the slightest competitive advantage in excitatory input.
In the network model discussed here, lateral inhibition is implemented as separate
inhibitory connections between lexical cell assemblies. The level of inhibition is directly
proportional to the degree of initial phoneme overlap (i.e. /blast/ and /blak/ inhibit one
another more than /blast/ and /blue/). If the lexical map is thought of as a space,
phonological overlap in a geometric sense becomes a cognitive distance metric.
Kinsbourne (1982) refers to the phenomenon that similar concepts tend to interfere with
one another more than dissimilar ones as the “cerebral distance principle”.
2.2 Sampling lexical networks
The implementation described above makes a number of assumptions about the
learning of words and the product of learning—long term connection strength between
cell assemblies. This is a shortcoming of a model that is not grounded to a physical
stimulus (either simulated or real) which learns weights directly from a developmental
perspective. That is, the networks used in this simulation are mature and were not
trained. Because this is a mathematical model of a cognitive system, the weights will
determine its behavior, and so it is important to make sure that the performance of the
system is consistent across some experimental range of assigned connection values. This
27
experimental range was determined more or less through trial and error as there was little
information available to use as a gauge of how large this range should be. In spite of this,
a method of assigning weights from a normalized distribution was imposed so that the
notion of a simulated “subject” could be defined and standard statistical analyses could
be conducted.
Each level in the network was assigned mean connection strength values
associated with incoming connections from the previous layer (Table 2.3). These values
were determined experimentally to attain a system that performed with a reasonable error
rate (< 5% error) for a network composed of 169 words. The connection strengths were
sampled from a normal distribution with a standard deviation of ± 5% from these mean
values. The task used in this initial exploration of parameters was to achieve a correct
mapping from phonemic input to lexical activity. The lexical unit with the
Table 2.3: Mean long-term connection strength values and standard deviations used in the final experimental network architecture.
Connection Strength Standard deviation
Phonemic�transitional 0.5 ± 5%
Transitional�lexical 0.3 ± 5%
highest overall activity level during a period of 750 ms after the onset of the stimulus was
deemed to be “recognized”. It should be noted that with a network of approximately
160-250 words, the behavior of the system never approached true winner-take-all in the
general case, although many trials did show a clearly dominant winner. Sometimes, a
“winner” for a trial did not dominate (operationally defined in section 3) very much at all,
although its activity was mathematically the highest.
28
2.2.1 “Subject” Networks
In order to run experiments, collect data, and conduct statistical analyses, the
notion of a subject was defined. First, let it be noted that every complete network
contained a set of words sharing the same initial phoneme. Modeling the dynamics of
lexical competition was the goal here and the behavioral evidence dictated that
phonological similarity might serve as a distance metric. Therefore, these networks
represent patches of cortex and the scale of the system that could be supported
computationally was constrained to words beginning with the same phoneme.
Subjects in real behavioral experiments are most often randomly sampled from an
assumed normal population. To mimic the variability of a normal population, the
connection weights were randomly sampled from a normal distribution of real numbers,
since in this particular model, the connection strengths are the variables that are
influenced by different life experiences, educational background, et cetera, that would be
found in real human subjects. Because we are assuming normality, it is likely that
subjects drawn from a population of native language speakers will have very similar
representations for monosyllabic words as well as the basic articulated sounds of the
language, so this simulation is well-founded.
Additionally the set of words represented by any particular network is free to vary
or remain the same across a group of subjects. By varying the word sets between subject
groups and obtaining significant main effects, the confidence of the statistic can be
elevated because it shows that the effect is robust across several datasets. Therefore, it
was the preferred method to use a random word set for one group of subjects, each
29
having individual network structures with respect to connection weights, and produce a
new randomized word set for another set of subjects.
A goal for this section was to provide the architectural framework for the
neurally-plausible model being proposed. The key architectural decisions —hierarchy
and the distribution of both excitatory and inhibitory connections—were made while
observing the constraints imposed by the behavioral evidence. It will be determined in
the next section whether the network, as defined here, will be capable of effectively
modeling the actual language processing phenomena that have been discussed.
30
3. Simulating the lexical priming experiments
A motivation for simulating the experiments carried out by Slowiaczek and
Hamburger (1992; 1996) is to provide support for the multiTRACE model which has
good theoretical grounding, but weaker empirical support. Simulation, though artificial,
can be a powerful tool whereby hypothetical mechanisms can be implemented to some
degree of realism and then tested using objective experiments. Additionally, it is the
hope that a simulation effort, if successful in replicating real phenomena, might suggest
additional hypotheses that can be tested further. That is, a simulation becomes extremely
useful if it is shown to have predictive value for subsequent behavioral experimentation.
The first experiment that will be described in this section corresponds to the general
priming experiment designed by Slowiaczek and Hamburger (1992; 1996).
3.1 Experiment 1
Producing a network that performed well was an iterative process, because there
are many factors that affect its behavior. The goal of this initial section is to describe the
first successful experiment that was run, in the sense that parameterization issues had
been solved and the network was performing well. A later section will be reserved for
discussing the additional issues that arose on the way.
3.1.1 Design
Thirty-seven lexical networks were built by the methods described previously
containing representations for 169 words. The words used were monosyllabic and were
standardized to 4 phonemes in length due in order to simplify the calculation of response
time. Three different word sets were used between the subjects, but data was not
31
analyzed with respect to this variable because the representations in the network are
unrelated to actual word labels or sounds, and these are merely arbitrary assignments
given to keep track of their individual behavior. Also, in all experiments, words that
contained redundant phonemes were not used as either a prime or target item because the
results in these cases were problematic. That is, it was not clear how to assign connection
weights between sub-word and word units when more than one connection was required.
Each generated “subject” was tested on 15 target words, which were paired with a
set of three primes that varied in initial phonological similarity with the target. Rather
than using a completely unrelated prime word as the control condition, instead the
baseline was a trial that did not include a prime. This was required because the networks
contained only representations of words with the initial phoneme in common. Again,
these networks model patches of cortex, not the entire lexicon.
As in the behavioral procedure, the prime word during each trial was “spoken”
aloud 500 ms (modeled time) before the target word, measuring time from the end of the
prime stimulus. Each word tested was broken down into its constituent phonemes and
presented to the network at a standard rate of 1 phoneme every 100 ms. Response time
was recorded on successful trials; since the ability to speak the target aloud was not
within the simulation’s capabilities, an alternative measure was used. An error-free trial
was defined earlier as one in which the correct representative cell assembly attained the
highest activity level within a 750 ms window after the onset of the target item. The
response time, therefore, was measured from the onset of the target item to the time at
which the target cell assembly was the most active representation for three consecutive
time steps. This window of domination corresponds to 30 ms. Several observation
32
sessions were performed by the experimenter to verify that this was a good objective
measure.
The parameters for this first experiment are summarized in Table 3.1, along with
the inhibition constant that was used. The inhibition constant is a multiplicative factor
used in addition to phonological overlap to determine the strength of inhibition between
two lexical assemblies.
Table 3.1: Parameters for Experiment 1. Inhibition factor is a constant multiplied by the inverse of cognitive distance to determine inhibition strength between two lexical cell assemblies.
Parameter Value
Phoneme interval 100 ms (10 simulated steps)
Target and phoneme length (in phonemes) 4
Number of subjects 37
Number of repetitions per subject 15
Number of words represented 169
Inhibition factor 0.7
3.1.2 Results and discussion
A typical trial run demonstrating lexical competition during the presentation of
both the prime and the target is presented in Figure 3.1. In this particular trial, “b-r-oh-
sh” was the prime word and “b-r-oh-f” was the target. The activity of the lexical units
clearly shows that the trial was successful in that recognition was correct.
The summary information for Experiment 1 is displayed in Table 3.2. The error
rate, which is expected to be high because the heuristic used for determining success is
not perfect, was fairly consistent across all groups, and so it was not tested further.
However, it should be noted that only the response times of successful trials were
33
included in the analysis. Additionally, for each subject tested there were four condition
means calculated from the 15 replications of each condition.
Figure 3.1: The activity levels of active lexical units during a trial run are presented. The onset of the prime (“b-r-oh-sh”) and target (“b-r-oh-f”) are indicated by asterisks (*) below the x-axis. The incompleteness of the curves at t < 350 is the result of a recording unit that only begins recording a lexical unit’s activity once it surpasses a certain threshold, for computational reasons.
Table 3.2: Mean reaction time and error rates as a function of priming condition No prime 1-phoneme overlap 2-phoneme overlap 3-phoneme overlap
RT Error rate RT Error rate RT Error rate RT Error rate
39.87 .11 37.98 .09 37.74 .09 35.24 .10
A one-way analysis of variance (ANOVA) was conducted in order to reveal any
statistically significant main effects of priming condition. The main effect of priming
condition was indeed significant [F(3,144) = 17.49, p < 0.0001]. It was also determined
34
that the 1-phoneme condition was different than the no prime condition and the 2-
phoneme condition was different from that of the 3-phoneme overlap condition [t(36) =
4.77, p < 0.00001], [t(36) = 3.79, p < 0.0006]. That is, phonological overlap produced
a facilitatory effect and this effect increased as overlap increased.
The facilitation observed in this experiment for all phonological overlap
conditions is not consistent with Slowiaczek and Hamburger (1992; 1996). However, in
the earlier study, facilitation was observed in the low similarity case and at the time was
attributed to a prelexical effect. That is, because the target words in these overlap
conditions are sharing input from phoneme units with the primes, they too are being
partially stimulated as the prime is being activated. Therefore, when the target word is
actually spoken, the subject is able to respond more quickly. The later study attributed
this facilitation to strategic processing whereby the subject, upon noticing the
relationships between many of the primes and targets in the trials, begins to make
expectations that this will again be the case in future trials. In this later study, expectancy
was controlled by manipulating the number of trials containing phonological overlap
across subject groups, a variable termed the phonological relatedness proportion (PRP).
It was only in groups containing a majority of trials with phonological overlap (high PRP,
75%) that displayed facilitation effects (Hamburger & Slowiaczek, 1996).
Because the results presented here do not agree with that of the behavioral
evidence, it suggested that this architecture, as currently implemented, was not correct.
Specifically, because facilitation increased with phonological overlap even as similarity
became high (3-phoneme overlap) the results suggested that the level of inhibition was
perhaps too low. A follow-up experiment was performed, adding the level of inhibition
35
as an independent variable so that its effect on the performance of the network could be
analyzed.
3.2 Experiment 2
3.2.1 Design
The experiment as described in Section 3.1.1 was repeated for four different
inhibition levels: 0.7, 0.9, 1.7, 2.0. These values are used to scale all of the inhibitory
weights assigned between lexical units in the simulation. One-hundred and forty-eight
subjects were used (37 for each inhibition group) in total. The previous data from
Experiment 1 was used as data for the inhibition level of 0.7 in this experiment. Network
parameters were also exactly the same as in Experiment 1 (see Table 3.1).
3.2.2 Results and discussion
The error rates of the networks as a function of inhibition and overlap are
presented in Figure 3.2. There appears to be an increase in error rate as the level of
inhibition is increased, however this trend is not consistent across all levels of inhibition.
However, because the error rates are relatively consistent across the overlap condition, it
is not likely to be a detriment to the analysis of response time across this variable and so
it was not analyzed further. However, it should be noted that the problem may involve
diminishing levels of activity as the speech “signal” is passed upwards through the levels
(see section 4.3). Additional research will have to be done in order to discover a way to
control this effect.
As was done in the first experiment, all trials containing an error in any of the
conditions were discarded from the condition mean calculation for each subject. Since
36
there was adequate replication at each condition level, it is plausible to continue with the
analysis of response latencies as a function of inhibition level. Figure 3.3 presents a
summary of these data.
0.00
0.05
0.10
0.15
0.20
0.25
0.7 0.9 1.7 2
Lateral inhibition level
Erro
r rat
e
no primeoverlap 1overlap 2overlap 3
Figure 3.2: Mean error rates across prime conditions, grouped by lateral inhibition level.
A two-way analysis of variance was used to investigate the possible main effects
of priming condition and lateral inhibition, and more importantly, determine if there is an
interaction. Significant main effects of both priming condition [F(3, 576) = 23.972, p <
0.001] and lateral inhibition [F(3, 576) = 23.839, p < 0.001] were indeed found.
Additionally, there was a significant interaction between the two factors [F(9, 576) =
8.000, p< 0.001].
37
The significant interaction was decomposed further by running four one-way
ANOVAs across the four priming conditions for each level of inhibition. There was a
significant main effect of priming condition for Inhibition 1 [F(3,144) = 17.49, p <
0.001] and Inhibition 2 [F(3, 144) = 19.78, p < 0.001]. Alternatively, there was no main
effect of priming condition for Inhibition 3 [F(3,144) = 2.18, p = 0.09], but there was a
main effect for Inhibition 4 [F(3,144) = 4.70, p < 0.004]. Follow-up t-tests determined
32.00
33.00
34.00
35.00
36.00
37.00
38.00
39.00
40.00
41.00
0.7 0.9 1.7 2
Lateral inhibition level
RT
(tim
e un
its)
no primeoverlap 1overlap 2overlap 3
Figure 3.3: Mean response times for all priming conditions grouped by levels of lateral inhibition. The fourth inhibition condition (Inh. = 2) replicates the findings of Slowiaczek & Hamburger (1992).
that only in the Inhibition 4 case was the 3-phoneme overlap condition slower than the 1-
phoneme overlap condition when using a two-tailed t-test [t(36) = 4.52, p < 0.0001].
The 1- and 3-phoneme overlap mean difference in the Inhibition 3 group did not reach
significance [t(36) = 1.76, p = 0.09]. Additionally, the 3-phoneme overlap condition was
38
not significantly slower than the 2-phoneme overlap [t(36) = 1.47, p = 0.15], a result that
agrees with the data of Hamburger and Slowiaczek (1996). Finally, in all levels of
inhibition, response time was faster with 1-phoneme overlap when compared to there
being no prime; Inhibition 1 : [t(36) = 4.77, p < 0.0001], Inhibition 2 : [t(36) = 4.98, p <
0.0001], Inhibition 3: [t(36) = 3.46, p < 0.002], Inhibition 4: [t(36) = 4.73, p < 0.0001].
The significant interaction between inhibition level and priming condition was the
result that was needed to provide support to the theory that lateral inhibition at the lexical
level is the mechanism responsible for high similarity interference (Hamburger &
Slowiaczek, 1996). At lower levels of inhibition, lexical competition never resulted in
interference, and there was always an increasing facilitatory effect as the number of
overlapping phonemes increased. Only when inhibition was raised to a sufficiently high
level (somewhere between 1.7 and 2.0) did the interference phenomenon have a
significant impact.
As this study deals with a simulated model of a particular aspect of speech
perception and is therefore markedly simplified, it would not be useful to labor upon
describing all of the inconsistencies between the actual behavioral evidence and the
results reported here. The larger picture that can be drawn from this simulation is that
inhibition is likely to be responsible for the high similarity interference. Additionally,
across all inhibition levels, low similarity facilitation occurs, so it can be concluded that
the two effects are dissociable—most likely the result of two different processes. In this
model, low similarity facilitation occurs because of the similarity in phonemic inputs
between related primes in the absence of strong lateral inhibition at the lexical level.
39
4. Other modeling issues
Inhibition at the lexical level was shown to be the primary factor affecting
competition in the simulated experiments described in the last section. As the level of
inhibition increased, the response time of the target unit increased in the high similarity
condition. Because this simulation is perfectly controlled, inhibition can be implicated as
the causal factor underlying this high similarity interference. However, because this is a
theoretical simulation, this result cannot automatically be generalized to actual human
beings, unfortunately, but it does provide good support for inhibition as a mechanism.
In this section, other issues that arose during experimentation will be discussed
and their effects on network performance and lexical competition will be examined. If
there is a reasonable explanation for why something was problematic, it will be
discussed, otherwise it will be left for a topic of future research.
4.1 Speech speed
The first major factor that was observed to have a dramatic effect on network
accuracy was the phoneme presentation interval. To reiterate, this was the length of time
that elapsed between the sequential stimulation of the phonemic units that comprised a
speech stimulus. In the final experiments, this interval was clamped at 100 ms. An early
set of experiments was run, manipulating this presentation interval to see if there was any
observable trend in error rate (Fig 4.1).
At first observation, this performance trend would suggest that the network was
only capable of handling a certain range of speech speeds adequately. Speech speeds that
fell out of this range see a marked drop in recognition accuracy. It may be postulated that
40
this is, in fact, a weakness of the current network model. Perhaps it is not robust enough
for this particular application, because humans seem to be very adept at understanding
many speeds of speech.
An alternative view, however, is to allow for the possibility that the modeled time
step in the simulation is not perfectly scaled to its real-world counterpart or that the
different cognitive levels simulated have different time scales. The simulated time
interval was the result of a modeling effort by Kaplan et. al (1991) to support Miller and
Marlin’s empirical findings suggesting that 5 seconds was the span of time after the
R2 = 0.9201
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 50 100 150 200 250
Phoneme interval (ms)
Pure
err
or ra
te
Figure 4.1: Pure error rate (target unit was overall winner) vs. phoneme interval. A quadratic trend was applied to the data points and its fitness was moderately good (R2 = 0.9201). The minimal error rate occurred when the interval was set to 100 ms, so this setting was used during the priming experiments.
presentation of an item during which information may be consolidating (1984). The
original designers of TRACE assumed that the model would have to be improved as the
network model was tested further. This suggestion to reinvestigate the scaling of time
41
in the model is highly speculatory, but it may be worth the effort. If, however, time is
actually slower in the model than in reality, then the range of phoneme intervals where
performance was adequate would be increased, according to the trend depicted in Figure
4.1.
4.2 Sequence bias
In section 2.0.2, it was suggested that a two-tiered model is incapable of
representing the highly sequential nature of a word stimulus. There must be some way to
encode the notion of order, because otherwise a word would merely be a set of phonemes
in which sequence did not matter. Because there are many instances in human languages
in which the meaning of a word is altered when constituent units are scrambled, it is
obvious that some mechanism or organization exists to account for the inherent sequence
of a word. The transitional layer was introduced in section 2 as a means to overcome the
initial problem of encoding sequence. However, an alternative mechanism could encode
this bias at the transitional-lexical interface. In such a scenario, a method of assigning
weights would take the position of the transition in the word into account and in essence
penalize transitions occurring late in the word. Tests showed that this was effective in
accomplishing the goal of successful word recognition and it was supported by the
evidence for recency effects in learning serial lists, but an alternative hypothesis may also
be correct.
This alternative may be that an even more complicated hierarchy exists (Fig. 4.2).
That is, it is possible that two or even three layers could exist between the phonemic level
and the lexical level. Additionally, the lexical level modeled here is also likely to be the
root of a higher hierarchy that includes support for polysyllabic words. The relatively
42
high overall error rate for this model that hovered close to 10% perhaps provides support
for this hypothesis. If the network is not making correct predictions it is because other
words that received the same degree of input are competing for dominance. It was not
always the case, either, that the winner of the competition in these unsuccessful
Figure 4.2: Other hierarchical possibilities for the lexicon. The levels drawn with solid lines correspond to levels represented in the current model. One possible layer in between the transitional and monosyllabic word layers that makes linguistic sense is a syllable layer.
trials was highly similar, phonologically. For example, say there are three words
receiving comparable input across some lexical map region. Two of the three words are
near in this region and so are inhibiting one another. The third competitor, because it is
receiving excitatory input that is on par with these two competing neighbors, is likely to
win the competition.
These types of situations occurred in the simulation here because the connection
weights were being chosen from a normal distribution of real numbers. Although its
probability is low, it is possible to assign a high weight value to a particular connection
that would typically and correctly only receive a low value. Because the real language
43
system is particularly effective in very noisy environments, the vulnerability witnessed
here in the simulated environment most likely does not exist in reality. Instead, it is
likely the case that more intermediate representations exist to further encode sequential
bias into the final representation of the word at the lexical level. The method of assigning
weights between the transitional and lexical layer was adopted at first because it was
effective and because it was computationally feasible. It was noted earlier that the
number of transitional units in the second layer was the number of legal combinations of
phonemes in the language. A second intermediary layer on top of the transition layer
would require N more units, where N is the number of legal combinations of transitional
elements. This combinatorial explosion made the implementation of a secondary
transitional layer unfeasible using the current research simulator.
4.3 Error rate and lateral inhibition
The data from Experiment 2 showed a trend where raising the level of lateral
inhibition increased the number of errors in word recognition. This increase in errors was
not specific to any of the priming conditions, however. Because lateral inhibition acts in
reducing competition and thereby focusing activity, it should have an effect to increase
performance accuracy. This was not the case, however, and so the issue must be
addressed.
Initial experimentation with networks built using a higher level of inhibition
suggested that the error-prone performance was due to a gain problem. That is, as more
inhibition was present at the lexical level due to lateral inhibition (not just regional
inhibition) the level of activity relative to incoming input was globally reduced in this
layer. An additional analysis of variance was conducted on a random sampling of 2080
44
trials from each inhibition level to determine if this effect was actually significant (means
are plotted in Figure 4.3). As expected, there was a significant main effect of inhibition
on the level of lexical assembly activity (the activities of the target assemblies were used
from both successful and unsuccessful trials) [F(3, 8316) = 401.63, p < 0.0001].
Follow-up t-tests indicated that all group means were significantly different, 0.7/0.9:
[t(2080) = 6.07, p < 0.05], 0.9/1.7: [t(2080) = 18.84, p < 0.05], 1.7/2.0: [t(2080) = 3.92, p
< 0.05]. This suggests that the stronger amount of inhibition is causing the
00.050.1
0.150.2
0.250.3
0.350.4
0.45
0.7 0.9 1.7 2
Inhibition level
Max
imum
targ
et a
ctiv
ity
Figure 4.3: The maximum lexical target activities are plotted across levels of inhibition. All means plotted were significantly different (see text).
speech signal, once it reaches the lexical level, to be weaker, and not sufficient to cause
excitation leading to a dominant lexical winner. Future research will have to address this
issue and reverse its effect.
45
4.4 Word representation
When building the test word sets for the experiments, a simplification was made
to remove all words that had duplicate phonemes or transitions (i.e., /blalb/), although
words with these characteristic were still represented in the network. This was done to
avoid any confounding variables that might have been created by incorrectly representing
the connections that define these words. At first this problem was addressed by imposing
a rule whereby secondary and tertiary connections from a transitional unit to the same
lexical unit were weaker than the primary connection. This initial solution, before it was
removed, was rationalized by learning theory.
Hebbian learning with cell assemblies can be characterized as a resource problem,
because underlying the associations between assemblies are physiological factors which
include the number of synapses between the neuronal members of a cell assembly.
Neurotransmitters need to be produced for each synapse, and so the active status of a
synaptic cleft is determined in part by how fast each cell can produce and transport these
chemical messengers to the terminus of each axon. Additionally, the process by which
learning occurs, which has been postulated to be the result of a long-term postsynaptic
membrane change is also rate-limited due to protein synthesis (in order to build
membrane channels, etc.). These resource limitations are likely to place a ceiling on the
strength of the connection between two cell assemblies. Therefore, if a transitional unit is
already associated with a word unit, its second association is likely to be limited in
strength because they are sharing the same underlying neuronal connections or because
the number of synapses a given cell can support is limited. Because a complete language
46
system will need to address this issue, additional empirical evidence is sought to provide
a sound basis towards any particular implementation.
4.5 Typicality effect
Typicality in a TRACE unit can be modeled by increasing internal long-term
connection strength which represents the strength of synaptic efficacy between the
neuronal members of the assembly. Frequently used linguistic components (phonemes,
words or composites of either of these) are likely to have tightly coupled internal
structures refined through experience that allows the assembly to reverberate with less
excitatory input. Given the same amount of excitatory input, an assembly with a highly
developed internal structure as opposed to one that is weaker internally will achieve a
higher activation level, thereby having a greater excitatory or inhibitory effect on
assemblies with which it is connected. Although this seems like a reasonable
implementation, typicality was not modeled here in order to reduce the chance for
confounding factors to be introduced.
47
5. Conclusion
A key feature of this simulation is that a particular problem in cognition (in this
case word recognition) can be approached without regards to content (Kaplan et al,
1990). The cell assemblies in this simulation are arbitrarily given labels so that they can
be interpreted as phonemes or words in a discussion, but the underlying computational
output of the system pays attention only to spatial relationships. Lateral inhibition
between cognitive assemblies was computed based upon the distance between the
corresponding assemblies. This distance was inferred from a phonological structural
relationship, of course, but this in turn was discovered through behavioral testing. The
structure-sensitive processing on display here is particularly interesting because it extends
naturally to a general theory of cortical function.
If one compares, for the moment, the human cortex and a digital computer’s
memory, it will immediately be apparent that they cannot possibly operate in the same
fashion. There is general support that the information of a cognitive “memory” is stored
in the weight matrix of synaptic efficacies between neurons. That is, the biochemistry at
each neural synapse has been modulated over time and through perceptual experience so
that the collective populations of neurons form an internal representation, or cognitive
map, of the outside world. In a digital computer, information stored in memory is
typically transferred to a register—a transient storage location that is local to the CPU
and therefore very fast to access—before computation is performed on it. This is simply
not a possibility in a brain, because this “register” would have to be composed of neurons
as well, and the time it would take to emulate the information contained in the synaptic
matrix of a particular area of memory in a temporary structure for separate processing
48
would be metabolically impossible, given the time constraints (Amit, 1995). Instead, the
memory must be accessed “in-place”, and it is because of this that a flexible set of
content-independent yet spatially-based operations becomes useful.
The cognitive process under study here was speech perception. Of particular
interest is the organization of the internal representations of words, phonemes and other
linguistic structures in the area of memory dedicated to language. Behavioral evidence
suggests that a relationship existed—based upon word structure—that causes a pattern of
interference in response times indicative of some kind of competition (Hamburger &
Slowiaczek, 1996; Slowiaczek & Hamburger, 1992). Using a computer simulation, a
series of experiments in this study showed that local inhibition between word units
restricted to the lexical level could account for this interference.
How do these results fit into a grand theory of cognition? In order to begin to
answer this question, it is necessary to consider Hebb’s original cell assembly hypothesis
again and how this might relate to the development of the language system (Hebb, 1949).
Hebb originally recognized the need for some kind of structure that would initially
respond to a stimulus then reverberate for some time after the stimulus is gone. Although
at the time he was not a neurophysiologist but instead a psychologist, he speculated that it
would be adaptive for two neurons that respond synchronously to a given external
stimulus to affect one another metabolically in such a way so that on the next
presentation of the same stimulus, there would be a stable and more unitary response.
More than 40 years later, neurophysiologists are beginning to find evidence for these
localized patterns of reverberation in macaques after being trained to respond to different
stochastically-generated visual images (Miyashita, 1988; Miyashita & Chang 1988;
49
Sakai. & Miyashita, 1991). The key results in these experiments relevant to the
discussion here were that the reverberations were consistently identifiable, lasted for as
long as 16 seconds, and were localized in cortical space (about 1 mm2 of anterior ventral
temporal [AVT] cortex) (Sakai. & Miyashita, 1991). All of this evidence lends great
support to Hebb’s construct of a cell assembly (Amit, 1995), or at least to some primary
component that might comprise of a complete cell assembly that may be more distributed
in nature. However, for the types of representations discussed in this project, especially
phonological representations, it is most likely the case that their representations are
concise and localizable—in not requiring additional sensory information that may be
obtained from transcortical regions.
Furthermore, Hebb’s understanding of learning in the brain can be applied here to
link the behavior of lexical systems discussed in earlier sections with their initial neural
development. The associative areas of cortex are typically referred to as areas
responsible for multimodal integration—a junction for many types of sensory inputs.
These areas are likely to be sufficiently plastic, initially, to configure themselves to
receive a wide variety of inputs and to form different types of internal representations
(Rauschecker & Sejnowski, 1994). It is possible, therefore, that the pattern of responses
they generate initially, due to some cross-cortical input, are more or less random,
especially when the animal is at an early age. From this point, however, Hebb’s learning
rule can explain how related responses could begin to be passively “recognized” by the
neural system and used as a signal to build more structure between the neurons involved
in a response. With repeated exposure to the stimulus, these same neurons that responded
initially will continue to respond, and their activity will begin to be supported by
50
additional excitable input from members of a forming cell assembly. Eventually, the cell
assembly may become so well-formed that they respond more or less as a unit and
reverberate for some longer duration than they would be capable of isolated from each
other.
However, consider the scenario of a new stimulus now produced in the
environment which is very much similar to the previous stimulus for which an internal
representation has already been formed. How different does this stimulus “appear” to the
neural system? That is, how similar is the excitation for the neuronal members of the cell
assembly whose development was discussed above? It is quite likely that this new
stimulus is perhaps the old stimulus, but viewed from a different vantage point.
However, because we are dealing with association areas, it is reasonable to assume that
object feature detection is accomplished at lower levels of the cognitive system and
remains constant, unless the perspective has changed drastically (Amit, 1995). Even if
this is not the case, multiple representations of the same object is not an impossibility in
association areas, as long as it is not required that every possible configuration be
represented, because then this becomes an impossible storage-space problem. The
question being asked is related to the general problem of category formation, and how
this in turn relates to spatially-aware cognitive processing.
Suppose some cell assembly, CA, is well-formed and responds to a particular
stimulus A. Another novel stimulus B is introduced, and it shares many of the
characteristics of A, but it also has features that do not fit with the feature set of stimulus
A. It is likely that neurons that are not members of CA in addition to some subset of CA
will respond to B. Let these neurons external to CA plus the neuronal population that
51
respond to both stimuli be population CB. If this is indeed how cell assemblies develop,
then inhibition is a mechanism that could help (together with short-term connection
strength, see section 1.1.1) two distinct population responses to form from these two
sufficiently different stimuli. The neural members that respond to both can act as
members of both assemblies (Rauschecker, 1995). However, the other respective neural
members can be made distinct in the long-term through inhibitory connections.
Consider what would happen if one particular stimulus, perhaps stimulus A, was
presented and inhibitory connections did not exist. CA neurons would begin to activate,
and the neurons whose membership spanned the two cell populations would also be
excited, either directly by the stimulus or through its recurrent connections with the other
members of CA. Because CB now is also receiving internal excitatory input through this
shared population, it too would begin to activate, perhaps more slowly. The result of this
stimulus presentation is a more diffuse response than is appropriate.
If inhibitory connections were now installed in this system between the separate
non-overlapping members of both cell assemblies CA and CB, the representations could
function distinctly. The inhibition prevents both cell assemblies from activating in their
entirety and so definite perception is achieved. This definiteness is implemented with
efficiency too, because it does not require complete distinction between members of a
cell assembly.
The lexical representations that were modeled here are conceivably developed in
the fashion outlined above. The behavioral evidence suggests this to be the case in that a
structural relationship was discovered that was linked with interference in response times
during a task where these representations were being accessed (Hamburger &
52
Slowiaczek, 1996). The results presented here provide support for the idea that this
interference phenomenon results from inhibition between phonologically similar lexical
representations. It is possible that physical proximity in lexical cortical areas is an
artifact of the continual evolution of phonological and lexical representations in early
development (i.e., /ta/ begins to be differentiated from /da/). That is, early
representations for language sounds may split when a single categorical representation no
longer remains consistent with learning signals being received from the environment (a
realization through some teaching cue that there exists both a “toe” and a “doe”, for
example, and that they are separate objects). These representations may become more
discrete using the inhibitory technique described above. It should be emphasized that this
solution seems optimal with respect to representational space because the commonality of
features is retained in the sharing of neuronal members between representations. This is
highly speculatory however and would need to be tested empirically, but the recent
success in finding neurophysiological correlates of reveberatory modules (Amit, 1995)
suggests that methods for doing this are fast approaching.
Acknowledgements
I would like to thank Professor Eric Chown for his unyielding support during the
course of this research-- he truly was an inspirational mentor. Additionally many thanks
are extended to the members of the honors committee, Louisa Slowiaczek, Rick
Thompson and Eric Chown, who each read my thesis and returned great feedback.
Special thanks are also needed for Professor Louisa Slowiaczek for providing both
extremely interesting data to model as well as theoretical suggestions that aided research.
53
Finally, I would like to thank my parents for continuing to believe in me throughout the
past year even through its periods of frustration.
54
References
Abraham, W.C. & Goddard, G.V. (1983). Asymmetric relationships between
homosynaptic long-term potentiation and heterosynaptic long-term depression.
Nature 305, 717-19.
Amit, D. J. (1995). The Hebbian paradigm reintegrated: Local reverberations as internal
representations. Behavioral and Brain Sciences, 18(4), 617-657.
Brodie, S.E., Knight, B.W., & Ratliff, F. (1978). The response of the Limulus retina to
moving stimuli: a prediction by Fourier synthesis, Journal of General Physiology,
72, 129-166.
Calvin, W.H. (1995). Cortical columns, modules, and Hebbian cell assemblies. In The
Handbook of Brain Theory and Nueral Networks (ed. Arbib, M.A.), Cambridge
MA: MIT Press, pp. 269-272.
Celsis, P. Doyon, B., Boulanouar, K., Pastor, J., Démonet, J. & Nespoulous, J. (1999).
ERP correlates of phoneme perception in adults. NeuroReport, 8, 919-924.
Chown, E. (1994). Consolidation and Learning: A Connectionist Model of Human Credit
Assignment. Doctoral dissertation. The University of Michigan.
Feldman, J.A. & Ballard, D. H. (1982). Connectionist models and their properties.
Cognitive Science, 19, 1-52.
Frauenfelder, U. H. & Tyler, L. K. (1987). The process of spoken word recognition: An
introduction. Cognition, 25, 1-20.
Hamburger, M. & Slowiaczek, L. M. (1996). Phonological priming reflects lexical
competition. Psychonomic Bulletin & Review, 3(4), 520-525.
Hebb, D. O. (1949). The Organization of Behavior. John Wiley.
55
Hetherington, P. A. & Shapiro, M. L. (1993). Simulating Hebb cell assemblies: the
necessity for partitioned dendritic trees and a post-not-pre LTD rule. Network, 4:
135-153.
Kaplan, S., Sonntag, M. & Chown, E. (1991) Tracing recurrent activity in cognitive
elements (TRACE): A model of temporal dynamics in a cell assembly.
Connection Science, 3, 179-206.
Kaplan, S., Weaver, M. & French, R. (1990). Active symbols and internal models:
Towards a cognitive connectionism. AI & Society, 4:51-71.
Kleinsmith, L. J., & Kaplan, S. (1963). Paired-associate learning as a function of arousal
and interpolated interval. Journal of Experimental Psychology, 65, 190-193.
Kinsbourne, M. (1982). Hemispheric specialization and the growth of human
understanding. American Psychologist 37(4), 411-420.
Liéguois-Chauvel, de Graff, J.B., Laguitton, V., & Chauvel, P. (1999). Specialization of
left auditory cortex for speech perception in man depends on temporal coding.
Cerebral Cortex, 9, 484-496.
Magleby, K. L. (1987). Short-Term changes in synaptic efficacy. in Synaptic Function,
New York, NY: John Wiley & Sons.
McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986). Parallel Distributed
Processing Explorations in the Microstructure of Cognition, Cambridge, MA:
The MIT Press.
Miller, R. R., & Marlin, N. A. (1984). The physiology and semantics of consolidation. In
H. Weingartner, & E. S. Parker (Ed.), Memory consolidation: Psychobiology of
cognition Hillsdale, NJ: Lawrence Erlbaum.
56
Milner, P. M. (1957). The cell assembly: Mark II. Psychological Review, 64: 242-52.
Miyashita, Y. (1988). Neuronal correlate of visual associative long-term memory in the
primate temporal cortex. Nature, 335: 817-20.
Miyashita, Y. & Chang, H.S. (1988). Neuronal correlate of pictorial short-term memory
in the primate temporal cortex. Nature 331:68-70.
Minksy, M. & Papert, S. (1969). Perceptrons. Cambridge, MA: The MIT Press.
Muller, D., Joly, M., and Lynch, G. (1988). Contributions of quisqualate and NMDA
receptors to the induction and expression of LTP. Science, 242: 1694-1697.
Murdock, B. B., Jr. (1962). The serial position effect of free recall. Journal of
Experimental Psychology, 62, 482-488.
Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard.
Pisoni, D.B. & Luce, P.A. (1987). Acoustic-phonetic representations in word
recognition. Cognition, 25: 21-52
Rauschecker, J. P. (1995). Compensatory plasticity and sensory substitution in the
cerebral cortex. Trends in Neurosciences 18: 36-43.
Rauschecker, J. P. & Sejnowski, T. (1994). Processing of visual and auditory space and
its modification by experience. In: Advances in Neural Information Processing
Systems, vol. 6, ed J. D. Cowan, G. Tesauro & J. Alspector.
Rochester, N., Holland, J. H., Haibt, L. H., & Duda, W. L. (1956). Tests on a cell
assembly theory of the action of the brain, using a large digital computer. IRE
Transactions on Information Theory IT-2: 80-93.
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and
organization in the brain. Psychological Review 65:386-408.
57
Sakai, K. & Miyashita, Y. (1991) Neural organization for the long-term memory of
paired associates. Nature 354:152-55.
Slowiaczek, L. M. (1994). Semantic priming in a single-word shadowing task.
American Journal of Psychology, 107(2), 245-260.
Slowiaczek, L. M. & Hamburger, M. (1992). Prelexical facilitation and lexical
interference in auditory word recognition. Journal of Experimental Psychology:
Learning, Memory and Cognition, 18(6), 1239-1250.
Sonntag, M.L. (1991). Learning sequence in an associative network: A step towards
cognitive structure. Doctoral dissertation. The University of Michigan.
Steinschneider, M., Schroeder, C.E., Arezzo, J.C., & Vaughan, Jr., H.G. (1995).
Physiologic corelates of the voice onset time boundary in primary auditory cortex
(A1) of the awake monkey: temporal response patterns. Brain and Language, 48:
326-340.
White, G., Levy W. B., & Stewart, O. (1990). Spatial overlap between populations of
synapses determines the extend of their associative interaction during the
induction of long-term potentiation and depression. J. Neurophysiol. 64: 1186-98.
Zatorre, R.J., Meyer, E., Gjedde, A.L. & Evans, A.C. (1996). PET studies of phonetic
processing of speech: review, replication, and reanalysis. Cerebral Cortex, 6, 21-
30.