Dynamic liquid association: Complex learning without implausible guidance

Neural Networks 22 (2009) 875–889

Contents lists available at ScienceDirect

Neural Networks

journal homepage: www.elsevier.com/locate/neunet

Dynamic liquid association: Complex learning without implausible guidanceAnthony Morse ∗, Malin AktiusCOIN Lab, Informatics Research Centre, University of Skövde, Sweden

a r t i c l e i n f o

Article history:Received 12 November 2007Accepted 27 October 2008

Keywords:Echo state networksAssociative learningSpreading activationNon-linear associative memoryCognitive roboticsConditioningPriming

a b s t r a c t

Simple associative networks have many desirable properties, but are fundamentally limited by theirinability to accurately capture complex relationships. This paper presents a solution significantlyextending the abilities of associative networks by using an untrained dynamic reservoir as an inputfilter. The untrained reservoir provides complex dynamic transformations, and temporal integration,and can be viewed as a complex non-linear feature detector from which the associative network canlearn. Typically reservoir systems utilize trained single layer perceptrons to produce desired outputresponses. However given that both single layer perceptions and simple associative learning have thesame computational limitations, i.e. linear separation, they should perform similarly in terms of patternrecognition ability. Further to this the extensive psychological properties of simple associative networksand the lack of explicit supervision required for associative learningmotivates this extension overcomingprevious limitations. Finally, we demonstrate the resulting model in a robotic embodiment, learningsensorimotor contingencies, and matching a variety of psychological data.

© 2008 Elsevier Ltd. All rights reserved.

1. Introduction: A model of human learning

Associative networks (Bruce & Valentine, 1986; Burton, Bruce,& Hancock, 1999; Hopfield, 1982; McClelland & Rumelhart, 1981;Page, 2000; Rumelhart &McClelland, 1986; Young & Burton, 1999)have been used to match data from human experiments in, forexample; overt, covert, and occluded recognition, semantic andrepetition priming, schemata production, andmany other domainsof psychological enquiry. The association of conceptual entitiesprovides an intuitive account of psychological phenomena andis highly influential as an explanatory theory within psychology.Simple models of associative plasticity (Hebb, 1949), capableof generating auto-associative memories, also provide a meansto model psychological learning (as opposed to acquisitiontechniques) accounting not only for these phenomena but also formany of the effects described in classical conditioning, operantconditioning, and sensorimotor contingency learning (Morse,2003, 2005a, 2005b, 2006; Morse & Ziemke, 2007). The problemthen is that such models are typically thought to require localistrepresentational units or symbols (Page, 2000) whose activity isindependently interpretable and whose existence is pre-given.The difficulty in generating such representations autonomouslyhas severely limited the domains in which these models can

∗ Corresponding address: COIN Lab, Informatics Research Centre, University ofSkövde, PO Box 408 SE, 541 28Skövde, Sweden.Tel.: +46 500448325.E-mail addresses: [email protected] (A. Morse), [email protected]

(M. Aktius).

0893-6080/$ – see front matter© 2008 Elsevier Ltd. All rights reserved.doi:10.1016/j.neunet.2008.10.008

be successfully developed. In this paper, we investigate thenovel use of an untrained reservoir system providing featuresof a very different, non-localist, nature which overcomes manyof these limitations allowing for an embodied and generalpurpose implementation of the associative network models ofpsychological learning and general purpose (i.e. non-task specific)functionality.We first describewhatwe take to be the fundamentallimiting factors with the standard approach.The major problem in constructing associative networks of any

kind is in decidingwhat exactly should be associated. By artificiallyselecting specific symbolic or localist entities upon which learningis to be applied, one restricts the model to a particular task ordomain (Bruce & Valentine, 1986; Burton et al., 1999; McClelland& Rumelhart, 1981; McClelland, Rumelhart, & Group, 1986; Page,2000; Young&Burton, 1999). If alternatively associative learning isapplied to the elements of a sensory data stream such as the pixelsof a camera, then only in contrived or unusual situations (such assize and position invariant prototypical views) can the networkachieve anything like object recognition. As Clark and Thornton(1997) show, the limit of unsupervised (unguided) learning is tofind functions satisfying the equation f (x, y) = p, where p isthe probability of finding y given x. When f (.) is achieved bylinear association then the solution becomes a matter of linearseparation between y and x. A different class of problems requirean additional transformation of x are termed relational problems,as their solution requires the relationship between elements of xto be made explicit. This can be written as f (t(x), y) = p, wheret(.) is unknown. Again if f (.) is achieved by linear association,these problems then become non-linearly separable, and thereforecannot be solved by simple associative networks. Any problem

http://www.elsevier.com/locate/neunet

http://www.elsevier.com/locate/neunet

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/10.1016/j.neunet.2008.10.008

876 A. Morse, M. Aktius / Neural Networks 22 (2009) 875–889

can however be represented in a manner making it linearlyseparable, trivially for example; by including the answer as partof the representation (Clark & Thornton, 1997). The difficulty is inautonomously finding a way to do so without explicit guidance.This difficulty can be reduced by enriching or supercharging.(Thornton, 2000) a data streamwith appropriate features such thatcomplex relationships become linearly separable, an appropriatedescription of the use of an untrained dynamic reservoir, as will bediscussed shortly.The remainder of this paper is structured as follows: Section 2

provides further details of the problems limiting the applicationof simple associative networks; Sections 3 and 4 introduce Liq-uid State Machines and Echo State Networks, while Section 5 pro-vides details of the learning associative network. Section 6 thencombines these networks detailing the Dynamic Liquid Associ-ation (DLA) model; Sections 7–11 then detail a number of realworld experiments using the DLA model on a robot demonstrat-ing obstacle avoidance, semantic and repetition priming, overtand covert recognition, ongoing learning, classical and operantconditioning, generalization, position and scale invariant recog-nition, behavioural refinement, phobic acquisition and systematicdesensitisation.

2. Explicit separation, poverty of stimulus, and temporaldislocation

While associative explanations of psychological phenomenaabound, such explanations are incomplete as they first require a so-lution to problems of marginal regularity, symbol grounding, andtemporal credit assignment. From a modelling perspective, all un-guided learning algorithms rely on the autonomous identificationof regularities (of one formor another1) in an input stream. AsKirsh(1992) puts it ‘‘what if so little of a regularity is present in the datathat for all intents and purposes it would be totally serendipitousto strike upon it? It seems to me that such a demonstration wouldconstitute a form of the poverty of stimulus argument’’ p. 317. Fol-lowing Clark and Thornton’s (1997) analysis of the limitations ofunsupervised learning (Section 1), such a poverty of stimulus isreadily apparent in problems requiring relational discrimination(non-linear separation with respect the input output mapping).This is because the necessary features indicating the discrimina-tion are not presented in an explicit form within the data stream.2The solution of suchproblems therefore requires somediscovery ortransformation process to make sufficient set of features explicit.The problem of marginal regularity is that there are relatively

few sensorimotor contingencies that emerge directly between oursenses and actions, rather these contingencies lie betweenpatternsof relationships in sensory data and patterns of relationships inaction (Morse & Ziemke, 2007). For example the regularities oraffordances offered by a cup are not consistently between the sameretinal pixels and the same muscular actions. Although many suchproblems are standardly solved, their solutions typically involve are-coding of the input stream such that those relational featuresnecessary to solve given problems are made explicit (e.g. in ahidden layer, or by feature detection). Without rejecting the ideathat evolution gifts uswith an appropriate re-coding of our sensorystream, itwould seem reasonable to questionwhat form thismighttake, for example prior explicit representation of all the mental

1 These regularities must be explicit with respect to the means of identifyingthem, therefore while some algorithms identify non-linear features, theirseparation, by design, is necessarily linear.2 Relational problems require a relationship between input features to beidentified rather than simply the presence or absence of a particular feature(e.g. parity problems).

level entities3 we could ever use seems unlikely. Perhaps a morerealistic possibility to consider is a set of entities maximizingseparation over features of our sensory streams, from which themental level can subsequently be constructed. A related problemis that of temporal dislocation where the result of an action maynot be immediate, and may in fact result from a sequence ofactions. This leads to a variation of the credit assignment problem(i.e. which subset of the actions performed is actually responsiblefor this particular sensory change). To solve this problem weneed some form of memory so that current events and/or actionscan be associated to passed events and/or actions, and we needsome indication of how long ago these things happened so thatexpectations can be timed appropriately.We suggest that in-order to explain psychological phenomena

in terms of association one is also required to explain theexistence of a large set of entities providing sufficient separationto construct complex psychological and abstract concepts, whilefurther explaining, at least, a short term memory capacity. Themodel that we propose and investigate here does exactly this, andby connection to biological data (next section) we suggest that thesame explanation provides accounts partially for the functioningof the neocortex.

3. Cortical microcircuits as dynamic reservoirs

This section describes how artificial neural models of corticalmicrocircuits create an expansion and continuous warping of theirinput streams which, although non-maximal, provides enhancedseparation (without problem specific design) over sensory inputstreams. The dynamics of cortical microcircuits further providea tractable fading memory incorporating highly complex andtemporal transformations. Thus these models provide candidatesolutions to the problems of explicit representation, poverty ofstimulus, and temporal dislocation described in the previoussection.

3.1. The liquid state machine (LSM)

Typical approaches to neural pattern recognition and behaviourproduction, (e.g. Hopfield networks. (Hopfield, 1982) or Beernetworks (Beer, 1990, 2000)) can be characterized by theformation of attractors pulling a system state toward variousregions of state space, where categorizations or decisions aremadeas the system settles into this or that basin of attraction. However,in the cortical microcircuit, interesting dynamics emerge fromthe perturbation away from a single, null state, attractor ratherthan selection between attractors. Studies of biological cortexsuggest that large regions simply cannot produce stable attractorsas ‘‘their dynamics takes on a life of its own when challengedwith rapidly changing inputs’’ (Maass, Natschlager, & Markram,2002a, p 1). An example of such a structure is the cortical micro-column or microcircuit, identified by vertical columns of neurons,tending to fire together (no synchrony implied). This sub corticalstructure is typically around 100 neurons in size and can befound in every cortical surface region (Mountcastle, 1998). Thesestructures, receiving complex topologically mapped input fromfeed-forward sensory projections have high inward and outwardconnectivity but are sparsely connected internally (Gupta et al.,2002; Markram, Wang, & Tsodyks, 1998). This sparse internalconnectivity combined with a constantly changing input preventsthese structures from settling into stereotypical cycles of activity(attractor states). Interesting dynamics therefore emerge from the

3 The conscious atoms of thought, e.g. concepts, linguistic powers, spatiallocations, emotional states etc.

A. Morse, M. Aktius / Neural Networks 22 (2009) 875–889 877

systems trajectory as it decays back toward a null point rather thanfrom selection between various basins of attraction.Although microcircuit models can be complicated, what they

all have in common is a sparsely interconnected network ofneurons in which activity coming into the system reverberatesaround a large randomly generated network, and would decayto a null state if the input were stopped (Maass et al., 2002a;Maass, Natschlager, &Markram, 2002b, 2002c) provide an artificialcortical microcircuit based on neuroscientific data obtained fromcortical columns in rat somatosensory cortex (Gupta, 2000;Gupta et al., 2002; Markram et al., 1998). Their model consistsof a randomly connected recurrent network of heterogeneousleaky integrate-and-fire (LIF) neurons, of which 20% are chosenat random to be inhibitory. As the behaviour of biologicalcortical microcircuits is observed to be non-chaotic, sparseinterconnectivity is randomly installed in themodel (thus avoidingchaotic dynamics). In the LSM models, this interconnectivityincludes a variety of both static and dynamic synapses, theparameters of which are also based on Gaussian distributions ofthe neuroscientific data. The resulting neural network implementsa null point attractor.4 However in biological cortical columns, asin the modelled microcircuits, there is constant input from thesensory stream, thus themicrocircuit is constantly perturbed fromits null state and never reaches it. This input is randomly connectedto the microcircuit to provide a constant interplay between inputperturbing the system, and the null attractor decaying it, theresulting network displays highly complex non-chaotic dynamics.Although the interconnectivity of the cortical microcircuit is

randomly generated, the model by Maass et al. (2002a, 2002b,2002c) (see Fig. 1) further incorporates supervised learning in thetraining of parallel perceptrons5 acting as readout units to themodel. The resulting system, known as a Liquid State Machine(LSM) has been demonstrated to provide a powerful methodof supervised pattern recognition performing well in variousbenchmark classification and delayed reconstruction problems(Maass et al., 2002a, 2002b, 2002c). However as the discriminatoryabilities of a single layer perceptron remain limited to linearseparation (Minsky & Papert, 1969), we therefore replace theperceptrons with an unsupervised associative learning algorithmhaving similar discriminatory properties6 (Morse, 2003) (seeSection 4).In order to understand how the LSM system works, Mass

et al. use an analogy between the activity reverberating arounda microcircuit and the ripples on a pond. Input to either system(e.g. dropping a stone into the water, or providing sensory inputto the LSM) causes activity (or ripples) which then reverberatearound the system, decaying over time. Just like the ripples on thepond, this residual activity is far from random and can be used totell us something about the disturbance(s) that originally causedit, thus it must carry information about those disturbance(s). Thisis formalized by a separation property; a point-wise measureof the (Euclidean) distance between subsequent states followingdifferent input perturbations. The greater the difference is betweeninput vectors, the greater the difference will be in the subsequentstates of activity within the microcircuit (Fernando & Sojakka,2003; Maass et al., 2002a, 2002b, 2002c). In constructions suchas the LSM, this separation property is implemented in global

4 Without continued input, the activity reverberating throughout the networknecessarily decays at each time step until there is no activity left.5 Linear readout units, connected to every neuron in the microcircuit by a singlelayer of weighted connections derived through supervised gradient descent.6 As discussed in the introduction unsupervised learning methods (adaptationbased on internal, locally available, data, rather than implausible training signals)are able to solve problems of linear separation.

Fig. 1. An example of the structure of a Liquid State Machine — corticalmicrocircuit. Dark circles indicate inhibitory neurons. Two input neurons are alsoshownon the left. For clarity only the 1st, 2nd, and 3rd order directional connectionsof a singlemicrocircuit neuron are shown (i.e. connections to that neuron, and loopsinvolving up to three other neurons).

dynamics, guaranteeing a temporary preservation of information(within the decay period of the network) so long as the networkdoes not become chaotic. Thus because the microcircuit cannotgenerate its own activity (guaranteed by the null attractor), anddifferences are preserved (guaranteed by the separation property),we know that all activity is a continual warping of the currentand recent input history.7 Such functionality is very similar to themethod of pattern recognition carried out by a Support VectorMachine (Christianini & Shawe-Taylor, 2000) where original inputdata is mapped into a high-dimensional feature space in whichan optimal separating hyperplane (maximizing the margin ofseparation between classes) is derived. This decision boundary islinear in feature space, but may be non-linear in input space. Theuse of a kernel function makes the computation of the optimalhyperplane possiblewithout the need for an explicitmapping frominput to feature space to be carried out. Unlike the computationallyefficient kernel of a SVM, the microcircuit explicitly implementsthe high dimensional feature space (achievedwithout supervision)in which linear boundaries can separate data which is not linearlyseparable in its original format. While approaches designed tomake explicit as many relational features as possible causea combinatorial explosion, the microcircuits non stationarydynamics provide transient separation over relational featureswithin a finite and tractable space.8 Thus features are distributed,not only over space but also through time. The additional temporalaspect of these models over the Support Vector approach can beviewed as a recursive projection; thus the result of warping at anytime will be part of the next warping. Besides being biologicallyplausible, the LSMmodel provides integration over time leading toa temporally changing warping while the SVM does neither.

7 Activity is equally related to the specific, randomly generated, architecture ofthat microcircuit (rather like a kernel function).8 No guarantee is made that any particular relational feature will be madeexplicit.


3.2. Echo State Network (ESN)

Originating from a very different design process the EchoState Network (ESN) (Jaeger, 2001a, 2001b, 2002a, 2002b) iscomputationally simpler than the LSM model but shares keyproperties. Rather than model the complex dynamics of biologicalsynapses and LIF neurons, the ESN consists of a randomlygenerated Continuous Time or Discrete Time Recurrent NeuralNetwork (CTRNN/DTRNN) implementing a null point attractor. AsESNs display similar dynamics to LSMs, and are able to providesimilar separation over recent events in the sensory stream (Jaeger,2001a, 2001b, 2002a, 2002b), we can surmise that the necessaryfactors on which separation relies are shared between the twomodels. An ESN circuit can be generated by various methods,including sparse random weight/interconnectivity generation.However, the randomly achievedweightmatrixW of the resultingnetwork is restricted to have a spectral radius of less than one,i.e. |λmax | < 1, where λmax is the eigenvalue of W which hasthe largest absolute value, which guarantees a null state attractor(Jaeger, 2002b). As the separation property is also preserved(Jaeger, 2001a, 2001b, 2002a, 2002b) the ESN is here viewed as acomputational simplification of the LSM architecture. As with theLSM, Jaeger also uses trained perceptrons as readout units to thenetwork; however, as previously stated, within the experimentspresented here these perceptrons are replaced by an unsupervisedassociative learning algorithmwith similar discriminatory powers(see next section).

4. Associative learning

Following connectionist work on spreading activation models,a simple learning algorithm has been demonstrated for theongoing construction of Interactive Activation and Competition(IAC) networks (Morse, 2003). Prior to this construction method,IAC models (Grossberg, 1978; McClelland & Rumelhart, 1981;McClelland et al., 1986; Young & Burton, 1999) consisted of anumber of localist (independently interpretable) units connectedvia designed connectivity to other localist units, such thatactivity spreads between related features while incompatiblefeatures compete (see Fig. 2). Unlike architectures derived throughevolution or gradient descent supervision, these networks functionin a manner introspectively similar to mental activity, in thateach thought, concept or idea primes related thoughts, concepts,and ideas (McClelland et al., 1986; Morse, 2005a, 2005b). Toautonomously construct these architectures, a unit selective to thecurrent context is generated by autonomous pattern recognition.In the models presented herein this is achieved using AdaptiveResonance Theory (ART) (Carpenter & Grossberg, 1987; Grossberg,1987) as it allows for the ongoing identification of consistentpatterns in subsections of an otherwise varying input vector (seeSection 4.1). Alternative methods of pattern recognition may beequally valid but are not considered here.The units selective to context (isomorphic with the Person

Identification Nodes in Fig. 2) are then fully connected to allnon-context units with Hebbian-adaptive connections, initiallyweighted at 0, and interact in a mutually inhibitory manner suchthat the most active unit is typically far more active than anyother unit. The connections to non-context units are then subjectto positive and negative Hebbian learning by the following rule:

The normalized Hebbian update function:1wij = λxixj(1−wij) (1)

where λ is a constant learning rate, xi is the current activation levelof the ith unit and wij is the strength of the connection betweenunit i and unit j.

Fig. 2. An IAC architecture. Lines shown between units indicate positiveconnections while units in individual clouds are negatively connected. In thisexample, three pools of information are shown, connected by a forth pool of ‘PersonIdentification Nodes’ which separate the associations between different contexts(in that they are differentially active in different global contexts as a result of thoseconnections).

Fig. 3. Showing the result of applying autonomous learning to a set of localistunits. In the centre is a pool of mutually inhibitory context units identifiedthrough autonomous pattern recognition, solid lines show positive connectionswhile dashed lines show inhibitory connections. This architecture is functionallysimilar to the IAC model shown in Fig. 2.

External input to the network provides activity to units and theinternal cycle spreads this activity via connections according to thefollowing rules:The positive correlation update rule:

∆xi = (1− xi)(6wij · xj

)− δ(xi − σ). (2a)

The negative correlation update rule:

1xi = (xi + 1)(6wij · .xj

)− δ(xi − σ) (2b)

where δ is the rate of decay and σ is the constant resting level ofactivity for any unit.Following this method of generating IAC models, inhibitory

connections result from the context units rather than frominhibition with a pool, although the context units (generated frompattern recognition) remain mutually inhibitory (see Fig. 3). Topreserve the properties of the original IAC architecture, excitatoryconnections are bi-directional, while inhibitory connections aredirectional from the context units to non-context units, thusreplacing the IAC within pool inhibition.When applied to localist domains, this combination of pattern

recognition and Hebbian learning generates architectures func-tionally similar to IAC models and preserves their psychological


properties (Morse, 2003, 2006; Young & Burton, 1999).9 Althoughthe learning process described is autonomous, the requirement of apre-defined localist node set is highly restrictive and typically pre-vents the use of this model in robotic contexts (see Section 2). Aswith all unsupervised learning, as the network gains experience,problems for which solution requires linear separation imply a oneto one correlation (positive or negative) between the explicit dis-criminatory feature and the desired response. Given comparableexperience of perfect correlations, Hebbian learning will modifyweighted connections to produce the strongest responses to thesefeatures, therefore providing solutions to these problems.10 In se-lecting and designing localist representations the resulting mod-els are typically abstract theoretical models (where units are notanchored to sensory stimulation), and are restricted to the solvingof problems which are presented in this linearly separable format.As suggested to in Section 1, a potential solution to the

problem of achieving separation is not to present (and byimplication design) precisely those representations necessary torender specific problems as separable (as in localistmodelling), butrather to incorporate a generic process by which all data streamscan be expanded to increase separation over potentially relevantfeatures of the current and recent input stream. To this end theunsupervised algorithm just described is now applied directly tothe activity space of a cortical microcircuit receiving input fromthe sensory apparatus of a real robot, rather than to a designedlocalist space or directly to the agents sensory data streams (as isalso provided for comparison in Sections 6–10).

4.1. Adaptive Resonance Theory

As Adaptive Resonance Theory (ART) (Carpenter & Grossberg,1987; Grossberg, 1987) is used in the autonomous generationof IAC-like architectures we refer the reader to Carpenter andGrossberg (1987) for a detailed analysis or to Gurney (1997) forimplementation details. In summary, ART refers to a family ofself-organizing networks capable of clustering pattern space andproducing weight vector templates while solving the plasticitystability problem (Carpenter & Grossberg, 1987; Grossberg, 1987).ART networks have no separate training or testing phases, insteadthey simply form new pattern templates as and when they areneeded. The simplest member of the ART family is ART1, whichis used for storing binary patterns (see Fig. 4). Here we simplyapply a threshold to the activity of each feature to decide whetherit is on or not, this then provides input to the ART1 network.During use, the first layer of the ART1 network, template matching(F1), receives both a binary input pattern and its complement(inverted) pattern. This then propagates to the template choosinglayer (F2), which constitutes the stored patterns, via bottom upweights storing the pattern templates in much the same way as acompetitive learning network. The best matching (between inputpattern and stored bottom upweights) unit is then themost activeof the template choosing layer. This node then changes the activityin the template matching layer via top down weights encoding anormalized version of the bottomupweights. The resulting changein the F1 layer has consequences on the activity of the F2 layersuch that only closely matching patterns can sustain the activityof the winning F2 pattern node. If activity is sustained then thewinning nodes weights are updated to more closely resemble thecurrent input (removing features not held in common betweenthe current and stored pattern). If it is not sustained then it is

9 For an alternative generation methods displaying similar properties see Page(2000).10 E.g. if a implies b then given a, expect b.

Fig. 4. Showing the ART1 network architecture. To preserve clarity, only one set ofbottom up and top down weights have been included in this diagram.

Fig. 5. The Dynamic Liquid Association architecture (DLA). Left shows input to acortical microcircuit from a robots sensory apparatus, the activity of which thenfeeds into the distributed associative network (right). This activity generates andactivates pattern recognition units (ART) (top), which are associated not only backto the microcircuit, but also to motor output units (bottom right).

excluded from the search and the process starts again. If no nodein the template choosing layer is active enough (determined bya resonance parameter, ρ) then a new node is generated withweights matching the current input. Further details of the ART1network can be found in (Carpenter & Grossberg, 1987; Grossberg,1987).

5. Uninformed learning from liquid spaces: Dynamic LiquidAssociation

If it is possible to train a perceptron readout unit in a LSM orESN to respond differentially to a particular input feature, theneven without doing so we know (given the limitations of singlelayer perceptrons) that that feature must be linearly separablesomewhere within the microcircuit. By this logic the supervisedtraining of readout units will not provide any greater separationthan is already present. Thus if we apply an unsupervised learningalgorithm directly to the microcircuit, then it will behave as if allthe features which could in principal be trained for identificationby a single layer perceptron were explicitly represented anyway,thus the perceptron training is an unnecessary step. By applyingthe unsupervised learning algorithm described in Section 4 to theactivity space of a cortical microcircuit (see Fig. 5), the limit oflinear separation is extended to include all those features of thesensory data stream which are made linearly separable within themicrocircuit. The schematic behaviour typical of an IAC network istherefore enacted from environmental experience as it shapes thedevelopment of the agent (Morse & Ziemke, 2007).In order to embed the model in a real-world robot, we must

first address how to produce behaviour from spreading activation


architectures. Although overly simplistic, the approach taken hereis to provide a number of winner-takes-all motor outputs to whichall generated context units are connected by weights modified byEq. (1).11 Providing weak random noise to these motor outputs isthen sufficient to provoke initial random behaviour from whichthe system can learn. As the agent randomly moves, its sensoryinput changes and the network adapts to predict contingenciesrelating to experienced invariance. It is anticipated that givensufficient experience this will lead to a mastery of the sensorymotor contingencies of the agent’s particular embodiment.12

6. Experimental setup

The Dynamic Liquid Associative network (DLA), as describedin the previous section has been embodied in a variety of robotsand shown to learn and produce coherent adaptive behaviour ina general (non-task-specific) and psychologically plausible way(Morse, 2005a, 2005b, 2006; Morse & Ziemke, 2007) (throughlocal self-organization rather than through error propagation,implausible supervision or prior design). In each experimentdetailed here the agent has been implemented using a wheeledrobot in a real environment (office). The robot (shown in Fig. 6)is equipped with three forward facing infrared sensors andsix bumper/collision sensors. No alterations in the methodsdescribed are necessary to account for different forms of incomingdata and the robot’s sensors are randomly connected to thecortical microcircuit (LSM or ESN) to which they provide aconstant input perturbation. The resulting activity vectors of themicrocircuit are then subject to associative learning constructingan IAC-like network of contextualised micro-features. Throughthe addition of motor outputs (for the production of behaviour)and inhibitory bumper feedback (providing motivation), thenetwork should adapt to produce coherent behaviour with respectto its motivational drives, in this case minimizing its bumperactivity.Weak random noise is added to the activity of each motor

neuron on every update cycle in order to provide the agentwith initial random behaviour and hence initial experience(albeit of disorganized behaviour) from which it can learn. Theimplementation is such as to prevent the agent from remainingstill (winner takes all between the motor outputs), as this wouldbe an uninteresting solution to the collision avoidance problem forwhich the agent is motivated. The winning motor output simplyincrements the speed of each motor appropriately rather thanproviding an absolute motor response (i.e. forward incrementsthe left and right motors by 1, backwards decrements them bothby 1, left increments the left motor and decrements the right,and right increments the right motor and decrements the left).As the agent’s homeostatic/motivational drives vary in activity(provoked by collisions), they become associated with featuresof the microcircuit’s activity vector consistent with their variousmotivational states. This provides a basis for future prediction ofmotivational states via these associations to themicrocircuit. Oncethe agent has unwittingly produced behaviour adversely affectingits motivational drives (such as a collision), the future activity ofthat drive will, in relevant contexts, inhibit behaviour leading toa predicted collision. The effect of motivation can be viewed inthe following way: Firstly the agent produces habits, in that it ismore likely to do whatever it has done before. Secondly each habit

11 This approach to behaviour production ignores the complexity of biologicalbehavioral mechanisms; however for the purpose of demonstrating an extensionto unsupervised abilities this mechanism will suffice.12 With respect to those input features which become linearly separable in themicrocircuit activity.

Table 1Parameters properties used in the experiments. Note that experiment 3 usesdifferent sizes of the ESN networks, as explicitly stated in Section 9. Furthermore,it should be noted that all neuron activity outputs lie in [−1, 1]. For the ESN/LSMneurons this is due do the tanh activation function, whereas for neurons in theassociative network, this is due to the activation rule given in Eqs. (2a) and (2b).

Bumper sensorsvalue

0, 1 Connectivity probability inESN/LSM

0.1

Range of IR sensorinputs

[0, 1] ESN spectral radius ]0, 1[

Range of motorneurons output

[−1, 1] Number of neurons in ESN 100

Range of motorneuron noise

[−0.1, 0.1] Number of neurons in LSM 125 (5*5*5)

ART1 resonance parameter,ρ

0.9

Associative learning decayrate, δ

0.5

Associative learning restinglevel, σ

−0.1

becomes associatedwith a reward or punishment and is thenmademore or less likely by this association. Finally by inhibiting habitsthat have an association to punishment, alternative weaker habitsare given a chance to become the dominant habitual responsein those situations. Properties of the parameters used in theexperiments are listed in Table 1.Assuming that the microcircuit has made separable sufficient

relevant features, it is anticipated that the future behaviour ofthe agent will become organized and coherent with respect tothe sensorimotor contingencies it has so far experienced. Thisset-up results in an embodied developmental agent displayingmultiple psychological phenomena and is open to structuredlearning through human and environmental interaction (Morse,2005a, 2005b, 2006; Morse & Ziemke, 2007). This is demonstratedin the following experiments.

7. Experiment 1: Basic priming

The robotic agent is placed in an office environment and left toroam around the room. Three versions of the agentwere generatedwhere one used the cortical microcircuit from a LSM, one from anESN, and, for comparison, one version had no cortical microcircuitand made associations directly between the bits of sensory input.As no significant behavioural differences were found betweenthe LSM and ESN versions of the agent, all comparisons shownare between agents with cortical microcircuit models and thosewithout. It has been suggested that this comparison is unfair andthat we should at least include temporal correlation algorithms,however, our aim is to demonstrate a biologically plausible modelextending the capacities of the original association models whilepreserving their relevance to psychological modelling. It is withthis explicit aim that the experiments reported here have beendevised.Initially, the agent’s movement is random (being provoked by

random noise) which can cause the agent to collide with variousobjects in the room. The experiment has been repeated more than100 times in each condition (LSM, ESN, & no microcircuit). Inevery case, agents incorporating microcircuits acquired collisionavoidance and exploratory behaviour, while non microcircuitagents became locked in cyclical sequences. Following an initialperiod of experienced collisions, every microcircuit agent is foundto be behaviourally responsive to increased infrared activity,indicating a close object (see Table 2). By comparison thenon-microcircuit model, being locked into specific behavioursdisplayed no significant correlation between sensor and motoractivity.From the first row of each table in Table 2 we can see that,

during the first 500 cycles of the network, there is very little


Fig. 6. The SEER-1 robot, schematic (Left) and picture (Right). The robot is equipped with bumper/collision sensors (B1–B6) on all sides, two motorised wheels, and threeforward facing infrared sensors (Ir1–Ir3). Movement is provided by two powered wheels and communication is via wireless serial port.

Fig. 7. Time lapse pictures of the robot in motion during the final 500 cycles of the experiment. Left the agent backs away from the wall and begins spinning on the spot.Centre the agent remains spinning on the spot. Right the introduction of a new obstacle causes the robot to back away and search for another location free from obstacles.

Table 2Resulting behavior from one microcircuit agent, showing the proportion ofvariability (calculated using Pearson’s R) in the activity ofmotor neurons accountedfor by the activity of sensory input. The top table shows values for the left sensor, themiddle table shows values for the centre sensor, and the bottom table shows valuesfor the right sensor. Columns (from left to right) refer to activity in the Forward,Backward, Left and Right motor neurons respectively. The four rows in each table(from top to bottom) show consecutive slices in time, 0–500 cycles, 500–1000cycles, 1000–1500 cycles, and 1500–2000 cycles.

Cycles Forward Backward Left Right

Left sensor

0–500 0.000128 0.00153 0.002004 0.000195

500–1000 0.238669 0.386965 0.305336 0.34429

1000–1500 0.413429 0.31957 0.45488 0.34429

1500–2000 0.388337 0.585232 0.263986 0.365513

Center sensor

0–500 0.009571 0.006931 0.000629 0.002553

500–1000 0.401961 0.532228 0.398309 0.46776

1000–1500 0.479666 0.459624 0.58746 0.36847

1500–2000 0.466209 0.660512 0.315846 0.437285

Right sensor

0–500 0.005428 0.000017 0.010839 0.007277

500–1000 0.369103 0.440576 0.19548 0.359117

1000–1500 0.360411 0.345432 0.373114 0.26514

1500–2000 0.392733 0.533229 0.262428 0.387525

initial correlation between sensory activity and motor neuronactivity, where the average correlation was 0.39%. This increased

to 37% in the second 500 cycles, 39.8% in the third 500 cycles,and 42.2% in the final 500 cycles showing a significant increasein the correlation between input and output. The initial lack ofcorrelation is caused by random noise; however, as learning takesplace the correlation between sensory input and motor responsebecomes high. Disregarding behaviour in the first 500 cycles, thecorrelation between the centre sensor and motor actions wasconsistently higher (averaging 52%) than between left (averaging48%) and right sensors (averaging 45%). While the correlationbetween the magnitude of the each of the sensors and each motordirection increased significantly, this was not in a simple one toone relationship, indicating that a more complex relationship wasemerging. In observation this agent in the last stage (1500–2000cycles) it typically approached obstacles, oscillating left and right,until it was reasonably close atwhich point itwould back away andnormally turn right. This behaviour accounts for the correlationsbetween sensor activity and all motor responses.As can be seen from the Table 2, for this agent, sensory activity

tends to have a higher correlation with the backward direction(second column), or left direction (third column), than with theforward direction (first column) (the highest correlations areshaded). Thus when presented with increased sensory activity theagent typically either turned left or reversed away from the object.High correlation figures elsewhere in the table show that althoughreverse and left were themost common responses; the agent oftenselected alternative behaviours as its internal context (winningART unit) changed.Fig. 7 shows the learned behaviour of one of the microcircuit

agents in 15 s time lapse pictures. For this agent, obstaclesprovoke a backing away behaviour followed by spinning on thespot when obstacles are at a sufficient distance. Spinning on thespot can be maintained for long periods of time while remaining


Fig. 8. The change in activation of motor neurons plotted against the change in the left sensory value for a single run using an echo state network agent. The top row showsdata from the first 500 network cycles while the bottom row shows data from the following 500 network cycles. Similar results were also found for the centre and rightfacing sensors.

sensitive to environmental changes such as the addition of a newobstacle (Fig. 7 right). This not only demonstrates the acquisitionof a successful avoidance strategy but also shows resilienceto catastrophic forgetting as the agent retains its avoidancebehaviours despite prolonged use of different behaviours such asspinning.A surprising result can be seen in Fig. 8 where the gain in

correlation (change from the top to bottom row) between thedirections of change of sensory vs.motor neuron activity is positivefor each motor and for each sensor. This is explained as the motoraction carried out on each time step is calculated from the winnerof the motor neurons, thus it is the relative difference rather thanan absolute positive or negative influence which determines theaction to be implemented. Further analysis of the weights’ valuesfrom individual context neurons shows that there is consistentlyone positive and three negativelyweighted connections to the fourmotor neurons. High values in alternative behavioural correlationstherefore demonstrate that behaviour is not simply triggered bysensory activation and must therefore rely on more complexfeature discrimination.The presence of a close object (with the exception of initial

behaviour) always provokes the agent to turn, or back awayfrom the source of this increased activity (sensors are forwardfacing), thus the reverse motor output (back away behaviour) isconsistently primed by the presence of an object within the rangeof the agent’s infrared sensors. Further to this, the closer the objectis to the agent, the stronger the activity of the reverse or left motorneuron becomes (see Fig. 9). Thus, although the agent may notrespond overtly with observable external behaviour (as behaviouris determined by the strongest output), upon sensing an object(indicated by increased IR activity), the back away or turn responsegains strength, increasing as the object gets closer until the agenteventually responds overtly (see Fig. 9).As the only motivating factor is to avoid bumper activity, it is

not surprising that the typical agent discovers simple repetitive

behavioural routines which normally avoid collisions, the mostcommon being spinning on the spot (see Fig. 7, note; the agentis unable to remain still). This experiment has been repeatedmany times, and occasionally an agent will develop favouringmovement in a particular direction (either forward or backward)which combined with acquired collision avoidance produces anexploratory behaviour. In either case (spinning or exploring),following an initial period of experienced collisions, every agentis found to be responsive to increased infrared activity (indicatinga close object).In Fig. 9, the initial presence of an object in range of the infrared

sensor provokes an increased reverse response. As the object isremoved (at time step 45) the reverse output of the agent dropsin activity and another action takes over. As the agent turns itencounters another object further away (time step 65) priming thereverse motor output but not overtly (i.e. the reverse action is notyet taken). As the agent moves forward (time steps 80–100) theobject gets closer and the reverse motor output gains significantlyin activity until it produces an overt response; backing away fromthe object (time steps 100 onward).Semantic priming refers to the phenomena whereby the speed

of recall of facts associated with a stimulus can be robustly ma-nipulated by recent prior presentation of other similarly associ-ated stimuli (Bruce & Valentine, 1986; Burton et al., 1999; Young& Burton, 1999). In DLA architectures, as in other spreading acti-vation models, associations between symbols, micro-symbols, orbehaviours lead to a transfer of energy between active and asso-ciated entities. The target response, having been recently primedby an initial first stimulus, retains decaying residual activity fora short period of time. If the second stimulus is presented duringthis time, the subsequent re-activation of the target is speeded upby this residual activity (see Fig. 10). In human data from experi-mental psychology (Bruce & Valentine, 1986; Burton et al., 1999;Young & Burton, 1999) this effect crosses domains of knowledgebut remains short-lived. In the DLA agent, semantic priming can


Fig. 9. The activity of the centre infrared input neuron (upper line) and the activityof the reverse motor output neuron (lower line) over time. In this particular agentrun, the correlation between infrared activity and the strength of the reversemotor output is high. Other agent showed similar correlations, although this wasoccasionally with turning rather than reverses and was usually accompanied by aninverse correlation to the forward motor output.

be demonstrated where the short presentation of an object to oneinfrared sensor followed by similar presentation to a different in-frared sensor provokes a faster avoidance reaction (onpresentationto the second sensor) than presentation to that sensor alone (seeFig. 10). In accordance with Burton’s explanation (Young & Burton,1999) semantic priming is a result of residual decaying activity initems/elements/micro-features associated with both stimuli.Repetition priming describes a far more constrained effect in

which repeated stimulation of associated pairs (such as repeating alist) causes a faster recall of response following presentation of oneof the pair (Bruce & Valentine, 1986; Young & Burton, 1999). This isalso related to exposure effects in which strong associations are farmore readily recalled than weaker ones (see Section 3), and so it istheorised that repetition, increasing experience of the correlation,leads to an increased strength of connection and thus a strongerand faster response (Young & Burton, 1999). Again, the DLA agentreadily displays this effect. For example an agentwhose experienceof collisions is predominantly on the left side, will respond to thepresence of objects on this sidemore quickly than to objects on theright side (see Fig. 11).Overt and covert recognition: In addition to priming, Figs. 10

and 11 also show overt recognition, in that upon presentation ofeither stimulus an appropriate behavioural (avoidance) responseis demonstrated. Covert recognition can be induced when lowlevels of primed activity cause changes in context resulting insubtle behavioural changes rather than direct responses (see Fig. 9time steps 80–10013). Thus, appropriate behaviours can be primedby very weak differences in activity, insufficient for overt recall.Similar behaviour can also occur from damage degrading thestrength of associations. In specific input regions thiswould lead tothe exhibition of prosopagnosia (Young&Burton, 1999), andwouldadversely affect further learning relating to new instantiations ofthe covertly recognized stimulus (see Section 3).Ongoing learning: Agents developing an exploratory roaming

behaviour (i.e. those agents that did not remain spinning on thespot or within a small bounded region) where left for several hoursduring which two very different behavioural sequences emerged.Agents which predominately remained in open spaces becamevery sensitive to infrared activity (indicating an object or wall in

13 It is not until time step 100 that the agent produces an overt behaviouralresponse.

Fig. 10. The activity of the motor reverse output neuron over time. The upper line(primed response) shows activity following infrared activity of the left sensorwhichstopped at time step 0, and then stimulation of the right sensor starting at timestep 10, while the lower line (non-primed response) shows activity following onlythe left sensor stimulation at time step 10. The increased response of the upperline (e.g. exceeding activity of 0.1 at time step 13 rather than time step 17) is ademonstration of semantic priming.

Fig. 11. The effect of repetition priming, the upper line shows the activity of thereversemotor output followingpresentation of an object on the left side of the agent(well practiced) and the lower line shows the response following presentation of anobject to the right side of the agent (rarely encountered).

close proximity) and would avoid objects so as to remain at somedistance from them. Agents which found their way into smallerenclosed areas early on (experiencing many more early collisions)developed the ability to travel very close to objects and followwallswhere their bumpers were just touching the object/wall withoutactually setting their bumpers off. Movement any closer wouldcause a collision; however, the agents were able to consistentlyperform extremely close range movement without colliding.Further analysis:Although the task here is simple, the behaviour

emerging from the agent is significantly more complex than thatof a Braitenberg vehicle (Braitenberg, 1984). As can be seen fromFigs. 12 and 13, with the exception of the brief collision at 1680cycles, from 1200 cycles onward the agent frequently approachesobstacles rather than turning or backing away from them. Thisagent frequently moves right up to an object before turning awayat the very last opportunity (e.g. Fig. 12 bottom centre). Sensoryactivity between 1200 and 1400 cycles (Fig. 13) shows the agentmoving around a cylindrical object (waste paper basket) at veryclose range (shown by the high level of sensory activity on the leftside only). As the agent moves round the basket it is confrontedby a wall (1400–1500 cycles) and reverses, turning away from


Fig. 12. Time lapse pictures of the robot in an office environment. (Top left) A collision early in the development of the agent. (Top centre & Top right) Successful navigationincluding a turn near obstacles. (Bottom left) The lead up to the initial collision. (Bottom centre & bottom right) Successful navigation without necessitating a turn at closerange following substantial development time.

the basket and back into open space, all without causing a singlecollision. At 1680 cycles the agent encounters a chair leg head onand turns to the left (as can be seen by the reduction in centreand left sensory activity), however it does very briefly collide withthe object on the right and immediately turns away, heading in adifferent direction.The context units shown in Fig. 13 highlight the different

situations the agent responds to. Throughout learning we can seethe gradual refinement and growing number of context units;however, during early development (up to 1200 cycles) we cansee that sensor activity frequently provokes contexts, and hence,reactions which were previously in use during earlier collisions(even where inappropriate e.g. 300 cycles). At around 900 cyclesthe agent discovers a crude form of close range behaviourand generates new context units. Subsequent encounters withincreased sensory input then have a much higher level of use ofthese new contexts, rather than the previous collision contextsdeveloped in early learning. This is apparent in the new closerange movement behaviours displayed. The maximum allowednumber of context units was roughly the same as the number ofnetwork units (100–125). Although not clear from Fig. 13, in longerexperimental runs it was found (data not shown) that the numberof context units saturate before the maximum allowed numberwas reached.From Figs. 14 and 15 we can see a very different set of

behaviours emerging from an agent developed without a corticalmicrocircuit. Firstly the agent’s behaviour is farmore stereotypical;having a far greater tendency to turn to the right and spiral around

the roombacking away fromobjects. As canbe seenbetween cycles900 and 1000 (see Fig. 14), the same object comes into range onsubsequent full turns of the agent; however, it is slightly furtheraway each time (indicated by subsequent lower peaks). The agentdid not copewellwith the complex environment however, as smallobstacles such as chair and table legs failed to provoke avoidance.The agent consistently collided with such objects, and continuedto turn into them, until it eventually slid round them and spiralledaway. This can be seen in Fig. 14 where each collision starts withhigh right sensor activity, which remains high throughout thecollision, while centre and then left sensor activity briefly rise asthe agent finally slides round the object to turn away from it. Thisis a very different pattern of activity than that found in agentsincorporating a cortical microcircuit (see Fig. 13).

8. Experiment 2: Classical & operant conditioning

Following Pavlov’s (1927) descriptions of classical conditioning,a conditioned response will be elicited by any stimulus that isconditioned by repeated exposure to a pairing of that stimuluswith one normally eliciting a reflex response. For example; dogsnormally salivating when presented with food, and not normallysalivating on hearing a bell, can be made to salivate on hearing abell (with no food present) by repeatedly pairing the bell soundwith presentation of food. Such learning does not require a rewardsignal of any kind as the agent, given some innate set of reflexbehaviours, will extend those reflex actions to the novel stimulus.Thus following the paring of the food and bell stimulus, as the


Fig. 13. The infrared sensory input activity (shown as lines) and the active contextunit (shown as dots) over time. The horizontal bar over the sensory activity showsthe sensory activity level at which a collision occurs. Blue indicates the rightsensor, red indicates the centre sensor, and green indicates the left sensor. (Forinterpretation of the references to colour in this figure legend, the reader is referredto the web version of this article.)

Fig. 14. The sensory activity over time of an agent developed by the same learningalgorithm without the addition of a cortical microcircuit. Blue indicates the rightsensor, red indicates the centre sensor, and green indicates the left sensor. (Forinterpretation of the references to colour in this figure legend, the reader is referredto the web version of this article.)

dog normally salivates when presented with food while hearingthe bell, it extends that reflex to occur when it hears the bellalone. The introduction of a reward signal allows for a morecomplex kind of learning, operant conditioning (Skinner, 1938),where the consequences of any particular behaviour can resultin modifications to the likelihood of producing that behaviour inthe future, such that rewarded behaviour becomes more likely tobe produced while punished behaviour becomes less likely to beproduced.In experiment 2 the agent’s physical embodiment is extended

to include a desk mounted pan-tilt camera (see Fig. 16) whichsupplies visual stimuli to the agent’s DLA control system. Due tothe large volume of input provided by the camera (640× 480× 3(rgb) pixel values) the image is scaled down to 10 × 10 × 3 pixelvalues and the LSM microcircuit was abandoned (for reasons ofcomputational speed). As before, the input is randomly connectedto the ESN microcircuit.For this experiment, the DLA agent is further modified to

include an additional input (a button press), which is directlyconnected to the motor output so as to force the agent to moveforward. This hard-wired behaviour is then seen as a reflexresponse analogous to the salivation in the presence of food elicitedby Pavlov’s dogs. Thus, when the new input button is pressed,

the robot necessarily (by hard-wired reflex) moves forward. Thisdirect hard-wired connection conforms to the kind of hard-wirednervous connection Pavlov theorised as responsible for naturalreflex responses.The initial random behaviour of the agents gradually became

organized to produce obstacle avoidance as in experiment 1.Following this, a large bold black arrowon awhite cardwas held upin front of the camera pointing vertically up. The agent producedno particular response to the presence or absence of the card in thevisual input, and thus we have the same situation as in Stage 1 ofPavlov’s experiments (see Fig. 17).Presentation of the upward pointing arrow and the button

push eliciting a move forwards reflex response were then paired(made to co-occur while the agent remains in open space, i.e. itis not forced to crash) providing the agent with experience ofthe visual arrow input whilst moving forward, thus, mirroring thesituation in Stage 2 of classical conditioning. Given experienceof this pairing (ranging from 500 to 1000 cycles) the agent wasthen able to consistently reproduce forward movement responseson presentation of the upward pointing arrow alone (withoutthe additional hard wired button press). This produces Stage3 of classical conditioning where the reflex response has beensuccessfully transferred to a stimuli which previously elicited nodiscernable external response (see Fig. 17).The same method was repeated to condition a reverse

(backward) response to a downward pointing arrow. Followinga further 500–1000 cycles of experience the agent was found tobe differentially responsive to upward and downward pointingarrows producing either forward or backwardmotion respectively.When sensory stimulation of the infrared sensors indicated acollision, the agent reverted to obstacle avoidance, returning tovisual discrimination once the obstacle had been avoided. Theagent was then conditioned for left and right pointing arrows toproduce left and right turns respectively. The agent could then bedriven by remote control simply by holding the arrow up in frontof the camera and turning it to point in one of the four conditioneddirections. This is not only a demonstration of conditioned learningbut also a demonstration of complex object recognition, in that thedifferences between an upward pointing arrow, and a downwardpointing arrow are non trivial.The same method was applied to an agent with no cortical

microcircuit which was completely unable to differentiate theupward and downward pointing arrows, consistently producingthe response to which it had been given the most conditioning.This shows how the dynamic reservoirs enable the separation ofpatterns.Classical conditioning can also be used to describe the learning

of the agent in experiment 1, where the unconditioned (reflex)response (UR) of the agent is to change direction when itencounters the unconditioned stimulus (US) (bumper activity).Initially the agent fails to react to activity on the IR sensors,however as this is typically high during a collision it soon becomesa conditioned stimulus (CS). Thus the presence of IR activity (CS)will in future provoke direction changes (CR) before a collisionoccurs. Following classical conditioning, stimuli associated withpunishment (either by experiment or implicitly by the physicalnature of the environment) quickly become avoided. This isdemonstrated in the robot’s successful acquisition of obstacleavoidance in both experiment 1 and experiment 2.Operant conditioning: In operant conditioning (Skinner, 1938,

1950, 1953), the consequences of any behaviour can result inmodifications to the production of that behaviour. In the embodiedrobot, this is again demonstrated by changes in the behavioursleading to obstacle avoidance. In experiment 2, the remote control,provided by presenting arrow stimuli to the camera, initiallyhas the potential to cause collisions; however, on being ‘driven’


Fig. 15. Time lapse pictures of the agent with no cortical microcircuit. (Left) A collision with the boundary. (Right) Backing away while turning results in a corkscrew-likemotion. In the more complex office environment (see Fig. 12) this strategy failed to produce robust obstacle avoidance and was consistently produced by agents without acortical microcircuit.

into obstacles the agent modifies its responses such that theremote control will eventually only work in situations wherethe visual ‘instruction’ will not cause a collision. If following thearrow is likely to cause a collision, the agent reverts to avoidancebehaviours and then, once the collision has been avoided, theagent reverts back to the behaviour associated with the presentedarrow. This hierarchy of behaviour is not unlike that produced bysubsumption architectures (Brooks, 1991) in which more complexbehaviours take over fromsimple behaviours; however, in this casethe hierarchical ordering of behaviour is a product of experiencerather than design.

9. Experiment 3: Generalization, position invariance, &refinement

In experiment 2 the DLA agent was conditioned to producespecific responses to the presentation of visual stimuli in particularorientations, while retaining the obstacle avoidance behaviourdeveloped in the previous experiment. In experiment 3 thisconditioning is extended so that the arrows presented visuallychange both position and scale within the visual input image.This is implemented by making the camera constantly twitchwhile varying the distance of the image from the camerabetween each manipulation of the arrows direction. Furthermodifications included the use of different colour arrows withdifferent behavioural mappings. The arrow card is held such thatthe arrow appears in varying positions within the visual field asthe camera moves. As with experiment 2, the agent followingexperiment 3 displayed generalization in that it was able torespond with the correct direction of movement appropriate tothe direction of the arrow presented. This is a more complex taskas the position and scale of the arrow are varying within thevisual field. It was found that a small number of presentationpositions failed to elicit the directional response. Where a 500neuron Echo State Network was used, more ‘blind spots’ werefound than in caseswhere larger 1000 neuron networkswere used.With either large or small microcircuits, presentation of a movingarrow consistently provoked the anticipated conditioned responseeven when the arrow image was moving through regions wherea stationary arrow image would not produce the conditionedresponse. Further investigation of the internal states of the systemrevealed a dual explanation for this result. Firstly, due to primingeffects (see experiment 1), motion from a responsive positionthrough a non-responsive region of the visual input providedsufficient residual activity to prime the agent to continue with itsoriginal response. Followingprolonged testing (stage 3, Fig. 17) thisresidual priming, acting as a weak source of conditioning (stage2, Fig. 17), gradually removed these ‘blind spots’ (irrespectiveof which microcircuit was used). Secondly, an analysis of theinternal states of the DLA architecture showed that moving

arrows producedmarkedly different activity vectors of the corticalmicrocircuit than stationary arrows. This not only demonstratesthe separation properties of the Echo State Network used, butalso highlights the use of the networks fading analogue memoryin a cognitive task. In general, there remained possible inputsequences which provoked an incorrect (unanticipated) response(as they generated significantly different liquid states. In the caseof static arrow images in the peripheral visual field, it is anticipatedthat movement by the agent would (assuming a mounted camerarather than a static camera) provide a means of overcoming thisincorrect response. Analysis of the internal states showed these‘blind spots’ to generate weak activity in all output units which iscontrasted with a significant winner in the ‘non-blind spot’ cases.Despite the apparent difference in the liquid states produced

by the moving and non moving arrow input streams, manysimilarities are also present allowing for the identification of thesame arrow in each case (for example bottom left unit in the toprow examples and the second down in the left column of thebottom row examples is consistently active).Generalization & refinement: The agent was tested to see

how it responded to the presentation of coloured arrows (red,green, and blue14). Although the responses of the agent weremore intermittent than on presentation of the black arrow, itwas still possible to control the agents movements (althoughmore crudely) by presenting the coloured arrows visually. Thecolour black, at least when processed by the CMU2 cameraused here, contains red, blue, and green. Once the visual inputenters the Echo State Network, the resulting state changes appearquite different (due to different random connections for eachcolour). Despite the changing liquid states, the agent is ableto discriminate (although with less accuracy) the direction ofthe arrows, thereby demonstrating a form of generalization. Tocomplicate the situation, it was decided that blue arrows shouldbe interpreted differently such that the behavioural mapping isrotated by 90 degrees (i.e. an upward pointing arrow means turnright, rightmeans back away, downmeans turn left, and leftmeansmove forward). The agent was then conditioned as above with thisnew mapping using the blue arrows. The resulting performanceof the agent having been conditioned with this new mapping forthe presentation of blue arrows and the original mapping for thepresentation of black arrows, was sufficient to allow for successfulremote control using the appropriate mapping for both blue andblack arrows (producing the different mappings for each colour)(stage 3, see Fig. 17). Red or green arrows then provoked mixedreactions, however, if one mapping was given more conditioningexperience (twice that of the other mapping) then it became

14 Separate presentation of each.


dominant with respect to the red and green coloured arrows15.This demonstrates the refinement of generalized mappings bysubsequent conditioning, thus generalization is refined, corrected,and extended through further environmental experience.As demonstrated in experiment 3, under normal circumstances

the DLA agent will make both false generalizations (e.g. fromblack to blue arrows prior to blue conditioning) and fail tomake some correct generalizations (e.g. blind spots), however,through continued experience the agent’s architecture will begradually refined so as to develop those contingencies (sensory-motor) presented through the environment. Experience of falsegeneralizations (such as from black to blue arrows) are quicklytransformed given further appropriate conditioning (which canarise though motivational inhibition as in experiment 1). Failureto generalize properly (such as from black arrows in one positionwithin the visual field to black arrows in another position inthe visual field) will be overcome during normal experience, asthe moving agent (already responding to an arrow cue) fails torecognize the next arrow cue (in a different position), however,as the agent is already moving in the appropriate direction, thiswill provide experience of the correct contingency to be learnedby priming the appropriate behaviour. Although this form ofconditioning through priming is weaker and therefore slowerthan explicit priming, it is sufficient to extend the results ofsimple conditioning experiments into the agent’s more variedenvironmental experiences.

10. Experiment 4: Phobic responses and cognitive behaviouraltherapy

At various points in this paper, it has been suggested thatthe current domain of psychology could be extended throughcumulative modelling to provide insight in to the internal mentalfunctioning of mind. Although somewhat contentious experiment4 is intended to demonstrate one of the potential ways in whichthis extension could come about.Phobias: Associative theories of learning lead to the possibil-

ity of misplaced associations occurring due to coincidences in anagent’s experience. Wolpe (1958) describes a psychotherapeuticaccount of how conditioning can result in the misplaced associa-tion of a stimulus present during a traumatic event, to a fear re-sponse. This condition is generally known as a phobic response

15 Irrespective of what colour is presented to the camera, in practice at least someof every colour is captured in every pixel by the CMU2 camera.

Fig. 16. The pan-tilt camera, and arrow stimulus used in experiment 2. Top rightshows sample images of upward and downward pointing arrows captured by thecamera.

and can be demonstrated in the DLA agent’s aversion to / phobiaof narrow corridors (Fig. 18). Those DLA agents following experi-ments 1, 2, & 3, which tended to remain in open spaces, associateany infrared activity with a collision and display an aversion re-sponse (backing away from the obstacle). These agents incorrectlypredict collisions when facing a narrow entrance (as infrared ac-tivity is typically high in such situations even though a forwardresponse would not cause a collision). This aversion is here inter-preted as misplaced, as the agent could easily pass through with-out causing a collision, and is, therefore, appropriately described bythe term phobia. In Cognitive Behavioural Therapy (CBT), a com-mon technique for overcoming phobic reactions is to graduallydesensitize the patient in a state of relaxation. This encourages adisassociation of the conditioned stimulus from the fear responsewhich would not normally occur as, without intervention, the pa-tient would have avoided further contact. To model this, the agentis given experience of an additional input during ‘safe’ situations,so that the new input associates to inhibit / reduce the predic-tion of a collision (as it is never normally paired with a collision).The new input, which was set to be active if the sum of the IR-sensors activity was lower than a threshold value, can thereforebe used to ‘relax’ the agent by reducing the expectation of a colli-sion.With this additional input active, the agent is able to approachthe narrow passage more closely, due to the reduction in levels ofanxiety, and gains substantially more experience contradicting themisplaced association. Following several such approaches (gainingexperience contradicting the phobia) the agent is able to traversethe passage and can subsequently do this without the aid of a ‘re-laxing’ input (Fig. 18). This gradual removal of an otherwise robustphobic response provides a clear example of how human interven-tion can adapt the behaviour of the DLA robot, and further providesa simple example of the kind of empirical investigations the DLAapproach makes possible.Spontaneous recovery of the original phobia was observed

throughout the desensitization period and can be explainedby the extinction of relationships on some but not all ofthe potentially applicable contexts. The inability to accountfor spontaneous extinction was one of the major failures ofthe Rescorla Wagner associative model of classical conditioning(Rescorla & Wagner, 1972; Rescorla, Wagner, Black, & Prokasy,1972; Siegel & Allan, 1996); however, Pavlov (1927) reports thespontaneous recovery of such associations. In our model thiscan be explained by several contexts applying to a situation,at least one of which has associations extinguished duringthe unlearning (desensitization) phase. Spontaneous recoverythen results from re-establishing non-extinguished contexts,either through differing recent histories, reminder cues orinhibition of the primary context, all of which have beenobserved in the agent model developed here. It is suggestedthat this could also facilitate the reacquisition of a conditionedrelationship by re-establishing usurped contexts. There are aspectsof conditioning behaviour the model fails to display. Theseunaccounted for aspects relate to the differences in strengtheningof associations following different reinforcement schemes. Forexample Skinner (1938, 1950, 1953) reports that intermittentnon-predictable reinforcement (i.e. random within a period ofpressing) strengthens behaviour more strongly than consistentreinforcement. This result is not upheld by the model developedhere.

11. Conclusion

This paper has demonstrated that the use of an untrainedreservoir system such as a Liquid State Machine, or Echo StateNetwork, as an input filter to an auto-associative network,significantly enhances the ability of that auto-associative network.


Fig. 17. The three stages of Classical Conditioning. In Stage 1, an unconditioned stimulus elicits an unconditioned response via a hard-wired reflex connection; theconditioned stimulus elicits no particular response. In Stage 2, conditioned and unconditioned stimuli are repeatedly paired. In Stage 3, the conditioned stimulus nowelicits the conditioned response independently of the reflex-response stimulus.

Fig. 18. Time lapse pictures of the agent overcoming a phobia through systematic desensitization. (Top left) Initial collisions resulting in phobic avoidance of narrowpassages. (Top centre & top right) Agents traversing the passage with additional ‘relaxing’ input. (Bottom left & bottom right) Agents subsequently traversing the passagewithout additional ‘relaxing’ input.

Furthermore, we have demonstrated in a grounded roboticexperiment that this single system is capable of producingdata matching empirical psychological data in; overt and covertrecognition tasks, semantic and repetition priming tasks, schemataproduction, classical conditioning, operant conditioning, phobiaacquisition models and systematic desensitisation. The resultingDLA model is applied to a wide range of tasks, domains, andmodalities of input and, without any task-specific modifications isable to display all these phenomena simultaneously. We thereforeconclude that this extension does not simply enhance the auto-associative model, but provides a method of cognitive modellingfor studying psychological phenomena in an embodied, integrated(i.e. not in isolation), grounded, general purpose (i.e. not taskspecific), and plausible (both biologically and psychologically)manner.

The task of identifying a moving object (such as an arrow) isa non-linear task as no particular input pixel from the imagesremained specific to one or other of the objects used in experiment2. This is confirmed as the agent without a cortical microcircuitis completely unable to differentiate between them. Agents withcortical microcircuits (either LSM or ESN) were both able tosuccessfully differentiate the upward and downward pointingarrows and have therefore demonstrated the acquisition ofbehaviour requiring non-linear separation over the original inputdata. Rather than this ability being generated by design or throughthe use of supervisory signals not normally available in the realworld, this has been achieved by a task independent method ofexpansion of a data stream. Not only does this lead to an extensionof the abilities of the uninformed learning algorithm in use, buteven where solutions are possible without this extension (as in


experiment 1) the quality and richness of behaviour displayed issignificantly enhanced by the inclusion of the microcircuit.The extension of uninformed learning presented here is

intended to be independent of the particular algorithm in use,rather, the expansion of a data stream (without task specificdesign) by a cortical microcircuit should bolster the abilities ofany uninformed learning algorithm one decides to use. Even therelatively simple Hebbian plasticity used herein is able to producehighly complex exploratory behaviour avoiding collisions andidentify complex moving visual stimuli to produce an appropriateresponse while remaining attentive to the more basic avoidancetask. This is a clear extension over the abilities of the standardalgorithm, and an extension of the limits normally associated withuninformed algorithms in general.In seeking methodologies for robotic development beyond

task specific design, this paper has identified the limit imposedby separation, which results from poverty of stimulus, and canbe elevated through warped expansion of data streams. Futureresearch aims to further demonstrate the complex learningabilities made possible by this approach in amore complex roboticembodiment, and to provide optimization of cortical microcircuitsfor separation tasks.

Acknowledgments

Thiswork has beenpartly supported by a EuropeanCommissiongrant to the project ‘‘Integrating Cognition, Emotion and Autonomy’’(ICEA, IST-027819, www.iceaproject.eu) as part of the EuropeanCognitive Systems initiative. The experiments were carried out aspart of Anthony Morse’s DPhil at the University of Sussex.

References

Beer, R. (1990). Intelligence as adaptive behavior: An experiment in computationalneuroethology. San Diego, CA, USA: Academic Press Professional, Inc.

Beer, R. (2000). Dynamical approaches to cognitive science. Trends in CognitiveSciences, 4(3), 91–99.

Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. MIT Press.Brooks, R. (1991) Intelligencewithout reason. In Proceedings of the 1991 internationaljoint conference on artificial intelligence (pp. 569–595).

Bruce, V., & Valentine, T. (1986). Semantic priming of familiar faces. The QuarterlyJournal of Experimental Psychology. A. Human Experimental Psychology, 38(1),125–150.

Burton, A., Bruce, V., &Hancock, P. (1999). Frompixels to people: Amodel of familiarface recognition. Cognitive Science, 23(1), 1–31.

Carpenter, G., & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, andImage Processing , 37(1), 54–115.

Christianini, N., & Shawe-Taylor, J. (2000). An introduction to support vectormachines: And other kernel-based learning methods. Cambridge University Press.

Clark, A., & Thornton, C. (1997). Trading spaces: Computation, representation, andthe limits of uninformed learning. Behavioral and Brain Sciences, 20(01), 57–90.

Fernando, C., & Sojakka, S. (2003). Pattern recognition in a bucket. In Proceedings ofECAL.

Grossberg, S. (1978). A theory of human memory: Self-organization and perfor-mance of sensory-motor codes, maps, and plans. Progress in Theoretical Biology,5, 233–374.

Grossberg, S. (1987). Competitive learning: From interactive activation to adaptiveresonance. Cognitive Science, 11(1), 23–63.

Gupta, A. (2000). Organizing principles for a diversity of GABAergic interneuronsand synapses in the neocortex. Science, 287(5451), 273–278.

Gupta, A., Silberber, G., Toledo-Rodriguez, M., Wu, C., Wang, Y., & Markram, H.(2002). Organizing principles of neocorticalmicrocircuits.Cellular andMolecularLife Sciences,.

Gurney, K. (1997). An introduction to neural networks.

Hebb, D. (1949). The organization of behavior. New York: Wiley.Hopfield, J. (1982). Neural networks and physical systems with emergent collectivecomputational abilities. Proceedings of the National Academy of Sciences, 79(8),2554–2558.

Jaeger, H. (2001a). The echo state approach to analysing and training recurrentneural networks. GMD report 148. German National Institute for ComputerScience.

Jaeger, H. (2001b). Short term memory in echo state networks. GMD report 152.German National Institute for Computer Science.

Jaeger, H. (2002a). Adaptive nonlinear system identification with echo statenetworks. In Paper presented at the neural information processing systems 2002.

Jaeger, H. (2002b). Tutorial on training recurrent neural networks, Covering BPPT,RTRL, EKF and the ‘‘echo State Network’’ Approach. GMD-ForschungszentrumInformationstechnik.

Kirsh, D. (1992). From connectionist theory to practice. In Davis (Ed.),Connectionism: Theory and practice. New York: O.U.P.

Maass, W., Natschlager, T., & Markram, H. (2002a). Computational models forgeneric cortical microcircuits. In Computational neuroscience: A comprehensiveapproach. CRC-Press.

Maass, W., Natschlager, T., & Markram, H. (2002b). A model for real-timecomputation in generic neural microcircuits. In Paper presented at the neuralinformation processing systems 2002.

Maass, W., Natschlager, T., & Markram, H. (2002c). Real-time computing withoutstable states: A new framework for neural computation based on perturbations.Neural Computation, 14(11), 2531–2560.

Markram, H., Wang, Y., & Tsodyks, M. (1998). Differential signaling via the sameaxon of neocortical pyramidal neurons. National Acad. Sciences,

McClelland, J., & Rumelhart, D. (1981). An interactive activation model of contexteffects in letter perception: Part 1. An account of basic findings. PsychologicalReview, 88(5), 375–407.

McClelland, J., Rumelhart, D., & Group, P. R. (1986). Parallel distributed processing:Explorations in the microstructure of cognition. Volume 2: Psychological andbiological models. Cambridge. MA: MIT Press.

Minsky, M., & Papert, S. (1969). Perceptrons; an introduction to computationalgeometry. MIT Press.

Morse, A. (2003). Autonomous generation of Burton’s IAC cognitivemodels. In Paperpresented at the European conference of cognitive science 2003.

Morse, A. (2005a). Psychological aLife: Bridging the gap between mind andbrain; enactive distributed associationism & transient localism. In A. Cangelosi,G. Bugmann, & R. Borisyuk (Eds.), Modeling language, cognition, and action:Proceedings of the ninth conference on neural computation and psychology: Vol.16 (pp. 403–407). World Scientific.

Morse, A. (2005b). Scale invariant associationism, liquid state machines, &ontogenetic learning in robotics. In Paper presented at the developmental robotics.

Morse, A. (2006). Cortical cognition: Associative learning in the real world. DPhilthesis. UK: Department of Informatics, University of Sussex.

Morse, A., & Ziemke, T. (2007). Cognitive robotics, enactive perception, and learningin the real world. In Paper presented at the cognitive science 2007.

Mountcastle, V. (1998). Perceptual neuroscience: The cerebral cortex. HarvardUniversity Press.

Page, M. (2000). Connectionist modelling in psychology: A localist manifesto.Behavioural and Brain Sciences, 23, 443–512.

Pavlov, I. (1927). Conditioned reflexes: An investigation of the physiological activity ofthe cerebral cortex. Translated by Anrep GV. London: Oxford University Press.

Rescorla, R., & Wagner, A. (1972). A theory of Pavlovian conditioning: Variations inthe effectiveness of reinforcement and nonreinforcement. Classical ConditioningII: Current Research and Theory, 64–99.

Rescorla, R., Wagner, A., Black, A., & Prokasy, W. (1972). Classical conditioning II:Current research and theory. In A theory of Pavlovian conditioning: Variationsin the effectiveness of reinforcement and nonreinforcement . Appleton-Century-Crofts.

Rumelhart, D., & McClelland, J. (1986). The PDP research group (1986). ParallelDistributed Processing: Explorations in the Microstructure of Cognition, 1, 1–547.

Siegel, S., & Allan, L. (1996). The widespread influence of the Rescorla–Wagnermodel. Psychonomic Bulletin & Review, 3(3), 314–321.

Skinner, B. F. (1938). The behaviour of organizma: An experimental analysis. NewJersey: Prentice Hall.

Skinner, B. F. (1950). Are theories of learning necessary?. Psychological Review, 57(4),193–216.

Skinner, B. F. (1953). Science and human behavior. New York: Macmillan.Thornton, C. (2000). Truth from trash: How learning makes sense. MIT Press.Wolpe, J. (1958). Psychotherapy by reciprocal inhibition. Stanford, CA: StanfordUniversity Press.

Young, A., & Burton, A. (1999). Simulating face recognition: Implications formodelling cogntion. Cognitive Neuropsychology, 16(1), 1–48.

http://www.iceaproject.eu

Dynamic liquid association: Complex learning without implausible guidance

Documents

Transcript of Dynamic liquid association: Complex learning without implausible guidance