Challenges for large-scale implementations of spiking ... · PDF filein their architecture...

ARTICLE IN PRESS

0925-2312/$ - se

doi:10.1016/j.ne

�CorrespondE-mail addr

b.glackin@ulste

Neurocomputing 71 (2007) 13–29

www.elsevier.com/locate/neucom

Challenges for large-scale implementations of spikingneural networks on FPGAs

L.P. Maguire, T.M. McGinnity, B. Glackin�, A. Ghani, A. Belatreche, J. HarkinaIntelligent Systems Engineering Laboratory, School of Computing and Intelligent Systems, Magee Campus, University of Ulster, Derry,

Northern Ireland, BT48 7JL, UK

Available online 28 July 2007

Abstract

The last 50 years has witnessed considerable research in the area of neural networks resulting in a range of architectures, learning

algorithms and demonstrative applications. A more recent research trend has focused on the biological plausibility of such networks as a

closer abstraction to real neurons may offer improved performance in an adaptable, real-time environment. This poses considerable

challenges for engineers particularly in terms of the requirement to realise a low-cost embedded solution. Programmable hardware has

been widely recognised as an ideal platform for the adaptable requirements of neural networks and there has been considerable research

reported in the literature. This paper aims to review this body of research to identify the key lessons learned and, in particular, to identify

the remaining challenges for large-scale implementations of spiking neural networks on FPGAs.

r 2007 Elsevier B.V. All rights reserved.

Keywords: Field programmable gate arrays (FPGAs); Hardware implementation; Spiking neural network (SNN)

1. Introduction

Inspired by the way the brain processes information,scientists and engineers have been researching neural networks(NNs) since the early 1940s [44]. NNs are an informationprocessing paradigm inspired by the way biological nervoussystems, such as the brain, process information. The keyelement of this paradigm is the novel structure of theinformation processing system. It is composed of a largenumber of highly interconnected processing elements, neurons,working in parallel to solve a specific problem.

The neuron is made up of four main parts; dendrites,synapses, axon and the cell body. A neuron is essentially asystem that accepts electrical currents which arrive on itsdendrites. It sums these and if they exceed a certainthreshold it issues a new pulse which propagates along anaxon. The information is transmitted from an axon to adendrite via a synapse, which is done by means of chemical

e front matter r 2007 Elsevier B.V. All rights reserved.

ucom.2006.11.029

ing author.

esses: [email protected] (L.P. Maguire),

r.ac.uk (B. Glackin).

neurotransmitters across the synaptic membrane. Anillustration of a biological neuron can be found in Fig. 1.Research into artificial neural networks (ANNs), has

seen the development of a plethora of neuron models fromthe initial McCulloch and Pitts concept [44] to the morebiologically realistic spike models [77]. Recent trends incomputational intelligence have indicated a strong ten-dency towards forming a better understanding of biologicalsystems and the details of neuronal signal processing[31,39,62,63]. Such research is motivated by the desire toform a more comprehensive understanding of informationprocessing in biological networks and to investigate howthis understanding could be used to improve traditionalinformation processing techniques [41,67]. Spiking neuronsdiffer from conventional ANN models as information istransmitted by means of spikes rather than by firing rates[7,22,27,55,60]. It is believed that this allows spikingneurons to have richer dynamics as they can exploit thetemporal domain to encode or decode data in the form ofspike trains [58,59]. However, this has demanded thedevelopment of new learning rules drawing again oninspiration from biology. For example, Hebbian learninghas been identified as a closely biologically related learning

www.elsevier.com/locate/neucom

dx.doi.org/10.1016/j.neucom.2006.11.029

mailto:[email protected]

mailto:[email protected]

ARTICLE IN PRESS

Dendrites

Soma

Axon

Myelin sheath

Terminal button or synapse

Fig. 1. Illustration of a biological neuron [6].

L.P. Maguire et al. / Neurocomputing 71 (2007) 13–2914

rule [28] and more recent research has reported a spiketiming dependent variation of this rule called spike timingdependent plasticity (STDP) [4,51,52,70] which modulatesthe synaptic efficiency of synapses.

Software simulation of network topologies and connec-tion strategies provides a platform for the investigation ofhow arrays of spiking neurons can be used to solvecomputational tasks. Such simulations face the problem ofscalability in that biological systems are inherently parallelin their architecture whereas commercial PCs are based onthe sequential Von Neumann serial processing architecture.Thus, it is difficult to assess the efficiency of these models tosolve complex problems [10]. When implemented onparallel hardware, NNs can take full advantage of theirinherent parallelism and run orders of magnitude fasterthan software simulations, thus becoming, appropriate forreal-time applications. Developing custom application-specific integrated circuit (ASIC) devices for NNs howeveris both time consuming and expensive. These devices arealso inflexible in that a modification of the basic neuronmodel would require a new development cycle to beundertaken. Field programmable gate arrays (FPGAs) aredevices that permit the implementation of digital systems,providing an array of logic components that can beconfigured in a desired way by a configuration bitstream[72]. The device is reconfigurable such that a change to thesystem is easily achieved and an updated configurationbitstream can be downloaded to the device. Previous workhas indicated that these devices provide a suitable platformfor the implementation of conventional AANs [5,19].

This paper aims to report on the issues arising from theauthors’ experience in implementing spiking neural net-works on reconfigurable hardware. This enables theidentification of a number of challenges facing the area interms of creating large-scale implementations of spikingneural networks on reconfigurable hardware, particularlythat operate in real time, and yet demonstrate biologicalplausibility in terms of the adaptability of the architecture.The paper is structured as follows: Section 2 provides areview of the literature in terms of the related work inrealising both classical and spiking neural networks onhardware. Section 3 summarises a number of approaches

that the authors are proposing to realise large-scaleimplementations of spiking neural networks on FPGAs,and Section 4 concludes the paper with a discussion on therange of different approaches and remaining challenges.

2. Background research

There has been a range of research results reported in theliterature on the implementation of NNs on both ASICand reconfigurable platforms. ASIC based approaches aretraditionally referred to as neuro-processors or neuronchips. FPGA based implementations are still a fairly newapproach since the early 1990s [80]. There are threeimportant aspects in terms of implementing NNs onreconfigurable hardware. In this section the authorsclassify and review them as implementation of learningalgorithms, multiplier optimisation schemes, and signalrepresentation for both classical and spiking NNs.

2.1. Implementations of classical neural networks

Initial work by Eldredge [13] saw the successfulimplementation of the backpropagation algorithm usingXilinx XC3090 FPGAs, calling the architecture the run-time reconfiguration ANN (RRANN) and proved that thisarchitecture could learn how to approximate centroids offuzzy sets. Mostly influenced by Eldredge’s architecture,Beuchat et al. [3] developed an FPGA platform calledRENCO (reconfigurable network computer). It containedfour Altera FLEX 10K130 FPGAs that can be reconfi-gured and monitored over any local area network (LAN)via an onboard 10Base-T interface. The intended applica-tion was hand-written character recognition. Ferrucci andMartin [14,42] built a custom platform, called adaptiveconnectionist model emulator (ACME), which consists ofmultiple Xilinx XC4010 FPGAs. ACME was successfullyvalidated by implementing a 3-input, 3-hidden unit, 1-output network used to learn the 2-input XOR problem[14]. Skrbek also used this problem to prove that his owncustom backpropagation-based FPGA platform can per-form this task efficiently [68]. The platform developed bySkrbek is called the ECX card and was capable ofimplementing radial basis function (RBF) NNs. Theplatform was validated by using pattern recognitionapplications such as the parity problem, digit recognition,inside–outside test, and sonar signal recognition.A number of alternative models were investigated by

other researchers for example, Perez-Uribe’s research [54]was motivated by the premise that NNs can be used toadaptively control robots and he employed evolutionaryneural networks to this end [79]. He implementedontonogenic neural networks on a custom FPGA platform,called flexible adaptable-size topology (FAST). The plat-form was used to implement three different kinds ofunsupervised, ontonogenic NNs. This architecture was thefirst of its kind to use unsupervised ontonogenic NNs, but

ARTICLE IN PRESSL.P. Maguire et al. / Neurocomputing 71 (2007) 13–29 15

somewhat limited since it can only handle those problems,which require dynamic categorisation or online clustering.

Contrary to Perez-Uribe, De Garis et al. [16] implemen-ted an evolutionary neural network and managed toachieve on-chip learning. Mostly influenced by MIT’sCAM project, De Garis designed an FPGA-based platformcalled the CAM-brain machine (CBM) where a geneticalgorithm (GA) is used to evolve a cellular automata (CA)based NN. Although the CBM qualifies as having on-chiplearning, no learning algorithm was explicitly included intothe CA. Instead, localised learning indirectly occurs by firstevolving the genetic algorithm phenotype chromosomefollowed by letting the topology of an NN module growwhich is a functional characteristic of cellular automata.The CBM was capable of supporting up to 75 millionneurons, making it the world’s largest evolving NN wherethousands of neurons are evolved in a few seconds. TheCBM proved successful in function approximation appli-cations. De Garis’s long-term goal is to use the CBM tocreate extremely fast and large-scale modular NNs whichcan be used for applications inspired by the human brain.Nordstrom [47] also attempted to feature modular neuralnetworks in an FPGA-based platform called REMAP(real-time embedded modular adaptive parallel processor).Nordstrom contemplated that reconfigurable computingcould be used as a suitable platform to easily supportdifferent types of modules. This kind of medium could beused in creating a heterogeneous modular NN like the‘hierarchy of experts’ proposed by Jordan and Jacobs [37].

The multiplier has been identified as the most area-intensive arithmetic operator used in FPGA-based ANNs[47,54]. In order to maximise the processing density, anumber of multiplier reduction schemes have beenattempted. Researchers have attempted the use of bit-serialmultipliers [13,71] and reduced range precision of multi-pliers [48]. It is possible to represent a signal that eliminatesthe need for traditional multipliers. Such types of signalrepresentations replace the need of multipliers with a lessarea-intensive logic operator. Perez-Uribe considered usinga stochastic-based spike train signal for his FAST neuronarchitecture where multiplication of two independentsignals could be carried out using a two-input logic gate[54]. Nordstrom implemented a variant of REMAP for usewith sparse distributed memory which allowed each multi-plier to be replaced by a counter preceded by an exclusive-or logic gate [46,71]. Another approach would be to limitvalues to powers of two thereby reducing multiplications tosimple shifts that can be achieved in hardware using barrelshifters [5,54]. The use of a time-multiplexed algorithm hasbeen traditionally used as a means to reduce the quantity asopposed to the range-precision of multipliers used inneuron calculations [40,54]. Eldredge’s time-multiplexedalgorithm [13] is the most popular and intuitive versionused in backpropagation based ANNs. This algorithm onlyuses one synaptic multiplier per neuron where one multi-plier must be shared among all inputs connected to aparticular neuron.

ANN performance is highly dependent on the range andprecision of signals used [49]. Limiting the numerical range-precision will lead to an increase in the quantisation error.In the case of the backpropagation algorithm, too large aquantisation error may cause the path of the gradientdescent to deviate from its intended direction. Theconvergence rate of backpropagation ANNs are sensitiveto the range precision of signal representation used. Givenenough quantisation error in neuron weights similarscenarios can be seen in other types of ANNs. For thisreason alone, the choice of range-precision and type ofsignal representation to be used in a given ANNarchitecture is one of the most important design decisionsthat has to be made. Four common types of signalrepresentations are frequency, spike train, floating andfixed point. Frequency based is categorised as a time-dependent signal representation because it counts thenumber of analogue spikes in a given time window whilespike train is categorized as a time and space-dependentsignal representation because the information is based onthe spacing between spikes and is delivered in the form of areal number within each clock cycle. The CBM [16] usedspike interval information encoding (SIIC) which is apseudo-analog signal representation that extracts multi-bitinformation from a 1-bit temporal encoded signal using aconvoluted filter. This architecture used a genetic algorithmto grow and evolve cellular automata based ANNs and wastargeted for neurocontroller applications. Although itlimits evolvability, De Garis selected a spike train signalrepresentation to ensure CAM-brain logic density would fiton the limited-capacity FPGAs (Xilinx XC6204 FPGAs).Fixed-point arithmetic is the most popular signal

representation used on FPGA based ANN architectures.This is due to the fact that fixed-point has traditionallybeen more area-efficient than floating-point [40,54] and isnot as severely limited in range-precision as both frequencyand spike-train signal representations. Eldredge’s RRANNarchitecture [13] used fixed-point representation of variousbit lengths. The motivation behind this was to empiricallydetermine the range-precision and bit length of individualbackpropagation parameters. Perez-Uribe [54] used 8-bitfixed-point in all three variants of the FAST architecture.ACME [14,42] used 8-bit fixed-point to converge thelogical-XOR problem using backpropagation learning.Skrbek [68] also used 8-bit precision to converge thelogical-XOR problem using backpropagation learning onan FPGA platform called the ECX card.Skrbek described how limited range-precision would

lead to longer convergence rates [68] by dropping down theresolution of the logical-XOR problem from 16-bit to 8-bitfixed-point. Eldredge [13] recommended using high range-precision of 32-bit fixed-point in future implementations ofRRANN for more uniform convergence. Taveniku [11]argued for increased precision up to 32-bit fixed-point forfuture ASIC versions of REMAP. However, reducing therange-precision helps to minimise the hardware areaconsumed. Therefore, the selection of the range-precision

ARTICLE IN PRESSL.P. Maguire et al. / Neurocomputing 71 (2007) 13–2916

for the signal type is a trade-off between convergence rateand area.

When considering the implementation of neural topol-ogies on FPGA hardware it is also important to considerthe impact the choice of design route employed has.Traditionally, the most common hardware descriptionlanguages for FPGA designs are VHDL (very high speedintegrated circuit (VHSIC) hardware description language)and Verilog, however, higher level tools such as Xilinxsystem generator (XSG) or electronic system level (ESL)languages such as System-C and Handel-C are becomingincreasingly popular [17,45,53]. A study by Ortigosa et alcomparing VHDL and Handel-C implementations of amulti later perceptron (MLP) network with regards to thesilicon area usage, data throughput capabilities andrequired computational resources indicate that althoughslightly better performance was obtained using the VHDLdesign route this was compensated for by a factor of 10decrease in the design time involved using Handel-C [50].

A recent review paper on FPGA implementations ofNNs discusses these issues further [80]. Previous work bythe authors concentrated on realising hybrid neuralnetworks on FPGAs and developed alternative techniquesto implement the non-linear transfer function [5] and theinherent connections required to implement the network[19]. The latter approach compared both an intrinsic andextrinsic training approach and assumed that a fullyinterconnected strategy was available but only employedthose connections that were required during training,thereafter the architecture was fixed.

2.2. Implementations of spiking neural networks

Before discussing the research area of hardware im-plementation of spiking neural networks (SNNs), it is firstnecessary to investigate relevant work in sequentialsimulators. Currently, there are numerous available en-vironments for SNNs that enable the construction ofbiophysically realistic models. One such example isNEURON [81], a simulation environment for developingand exercising models of neurons and networks of neuronswhich is particularly suited to problems where the cableproperties of cells play an important role and where cellmembrane properties are complex, involving many ion-specific channels and ion accumulation. It evolved from theDepartment of Neurobiology at Duke University where thegoal was to create a tool designed specifically for solvingthe equations that describe nerve cells. GENESIS [82] is afurther example of a general purpose simulation platform,developed to support the simulation of neural systemsranging from sub cellular components and biochemicalreactions to complex models of single neurons, simulationsof large networks, and systems-level models. SpikeNNS,which facilitates support for simulations with spikingneurons, is an extension of the ‘‘Stuttgart neural networksimulator’’ (SNNS) [83], a public domain sequentialsimulator, developed at the University of Stuttgart, which

utilises an online backpropagation training method. Thedevelopment of this simulator was inspired by anothersequential simulator, the Rochester connectionist simula-tor (RCS) [23]. The RCS is a simulator program used forarbitrary types of NNs and includes a backpropagationtraining package and a Sunview interface. The currentversion of Spike NNS implements several learning rules fora self-organisation process with spiking neurons. Thespiking neural simulator (SNS), developed [69] at theUniversity of Stirling in Scotland, is intended as a researchtool for investigating spiking neural systems and particu-larly, their use in spike based auditory processing. In thissystem, once the network has been set up and the input filesdetermined, the network is simulated using a fixed time-step for a predetermined length of time. The user can thenchoose to save spikes, alter the display, try different sets ofinputs or synapses and run the simulation again.Addressing the issue of significantly increased computa-

tion times when large-scale networks are targeted, Spike-NET [11] is an event-driven neural simulator for modellinglarge networks of integrate-and-fire neurons. Contrary to atime-driven simulation, which is generally used to simulateSNNs, simulators utilising an event-based approachrequire less computational resources due to the low averageactivity of typical networks. Neurons are simulated with alimited number of parameters that includes classicalproperties like the post-synaptic potential and threshold.SpikeNET can be used to simulate networks with millionsof neurons and hundreds of millions of synaptic weights.Optimisation of computation time and the aim of real timecomputation has been one of the driving forces behind thedevelopment of this system. Milligan [84] is an alternativeapproach which aims to mitigate some of the problemsassociated with modelling networks of spiking neurons,namely connection management, time management, andmanagement of communication between neurons. Anothersimulation environment called SPIKELAB [25], developedat the University of Bonn, incorporates a simulator that isable to simulate large networks of spiking neurons using adistributed event driven simulation strategy. It is alsopossible to integrate digital or analogue neuromorphiccircuits into the simulation process. Amygdala [85] is open-source software for SNNs. Still in its early stages,Amygdala has already been used to produce workingNNs, and continued development is focused on increasingAmygdala’s efficiency and usefulness and on laying thegroundwork for several important advanced features. Thebiological neural network (BNN) toolbox [86] is MA-TLAB-based software capable of simulating networks ofbiologically realistic neurons. This software enables a userto create and simulate various BNN models easily by usingbuilt-in library models. A user can also create custommodels and add them to the library by using librarytemplates. A set of descriptive examples are available togive a quick introduction to the toolbox and to reduce thecoding time. The toolbox only covers spiking models ofneurons and biologically plausible network components.


While the software based simulators described here areall viable platforms, the large simulation times incurred,particularly for large-scale SNN networks, has ledresearchers to investigate custom hardware solutions. Thevarious approaches can be categorised into three majordomains: ASIC, digital signal processing (DSP) accelera-tor, and FPGA-based implementations.

The neurocomputer for spike-processing neural net-works (NESPINN) and the memory optimized acceleratorfor spiking neural networks (MASPINN) are examples ofsystems that utilise accelerator-boards connected to a hostcomputer to accelerate SNN simulation times [34,66]. Bothsystems employ ASIC neuroprocessor chips with a degreeof flexibility in that they permit the configuration of aneuron model with up to 16 dendrite potentials withdifferent functionality, e.g. inhibitory, excitatory, andmultiplicative [65]. Another notable approach based onASIC technology is that proposed by Schemmel et al whichconcentrates on creating a platform for simulating large-scale SNN networks while maintaining a speed severalorders faster than real time [64]. The synapse modelincludes an implementation of STDP, thus providing onchip learning with the approach containing a frameworkfor enabling several chips to operate in parallel such that itshould be feasible to build networks in the order of 10,000neurons. Taking an alternative approach and concentrat-ing on creating a hardware system closely aligned tobiological principles, Renaud et al. have developed a neuralsimulation platform designed as a tool for computationalneuroscience [38,56]. The system incorporates customdesigned ASIC chips where the neuron model is based onthe computationally intensive but biologically plausibleHodgkin–Huxley (HH) neuron model. The system com-putes in real-time and emulates in analogue mode theelectrical activity of single neurons or small NNs with thesystem also being used in the implementation of ‘‘hybridnetworks’’, where living and artificial neurons interact inreal-time in a mixed NN. Recent collaborative workbetween the Intelligent Systems Engineering Laboratory(ISEL) group and the University of Liverpool is investigat-ing novel approaches to realise compact models of SNNson dedicated hardware using charged coupled synapses[7–9].

Entering the domain of a DSP accelerator basedapproach, ParSPIKE was developed by Hartmann et al.[74], This was an enhanced version of the SPIKE128karchitecture [26], using Eckhorn model neurons forsimulations. The architecture for a DSP accelerator wasdeveloped and showed the advantages with simulationresults from a typical large vision network. The aim of anaccelerator system under these conditions is acceleration ofthe sequential simulation algorithm, reduction of thenumber of neurons and connections to be updated pertime slot, and distribution of the neurons onto as manyparallel units as possible. Furthermore it is reported thatcommunication problems were solved by means of lowspike rates in the massively parallel architecture because

updates of neuron parameters are held in local nodes.Therefore, the main part of calculation required noadditional communication. It was shown that implementa-tion is less expensive than in the previous SPIKE128ksystem. An irregular connection board may simulate up to256k neurons with up to 32M synapses and a regularconnection board can simulate up to 512k neurons. Boardsfor irregular connections contain 16 DSPs and the memorysubsystem, while boards for regular connections consistonly of DSPs and hardware switches. The prototypearchitecture with 2 irregular connections and one regularconnection board with virtual machine environment(VME) can simulate up to one million neurons.The third major domain of hardware-based implementa-

tions, i.e. FPGA-based approach, has been adopted by anumber of research groups. These devices provide a largedegree of flexibility in terms of developing SNNs for aparticular task without committing to costly silicon ASICfabrication whilst also providing a platform suitable forrapid prototyping. In addition digitally based SNNsprovide a number of other desirable features such asnoise-robustness and simple real-world interfaces. Rogganet al. [57] reported a cellular SNN model with reconfigur-able connectivity. The model was implemented on anFPGA and an array of 64 neurons has been implementedand successfully demonstrated in a task of obstacleavoidance for a small mobile robot. Upegui et al. fromEcole Polytechnique Federale de Lausanne (EPFL) havepresented a functional model of spiking neuron intendedfor hardware implementations [73]. The model allows thedesign of speed and area optimised architectures. Targetingguaranteed real time performance, a software/hardwarecomputing platform incorporating a peripheral componentinterconnect (PCI) based FPGA accelerator board toperform pipelined neural computations has been developedby Ros et al. [61]. At present, the network learning isperformed in software; however, the authors state theirintention to migrate this capability to the hardwarecomponent of the system. The overall goal of the work isstated as being to investigate biologically realistic modelsfor real time robotic control operating within closed actionperception loops. The importance of embedding thelearning and the routing of spikes onto the FPGA toexploit the inherent computing parallel resources isacknowledged, however, the difficulties this incurs are alsonoted, as this requires the network topology to be storedand managed completely by the hardware component [29].An illustrative example, also employing a pipeline comput-ing structure but using a more complex underlying model,has been reported by Graas et al. [24].Some features of biological spiking neurons are ab-

stracted while preserving the functionality of the networkin order to define an architecture which is then implemen-ted on the FPGA device. The model permits the designer tooptimise the area or speed criteria according to theapplication. In the same way, several parameters andfeatures are optional in order to allow more biologically

ARTICLE IN PRESS

Fig. 2. Spike train generation.

Table 1

Resource estimation for multiplier-less implementation

Neuron:synapse ratio Slices Embedded multipliers

1:1 8 0

1:10 80 0

1:100 800 0

Table 2

SNN network capabilities for multiplier-less implementation

Neuron:synapse ratio Neurons Synapses

1:1 3936 3936

1:10 393 3930

1:100 39 3900


plausible models by increasing the complexity and hard-ware requirements of the model. Maya et al. [43] reported acompact SNN implementation on FPGAs. The neuron iscomposed of a synaptic and a summing-activation module.The architecture of the neuron is characterised and itsFPGA implementation is presented. The XOR network iscompiled for a Xilinx Virtex XV50-6 device and exhibits aresources utilisation of 240 slices with a maximumoperating frequency of 90MHz.

The authors have also reported research results in thearea of computational implementations of SNNs[1,2,22,36,71,76] ranging from the developments of trainingalgorithms to their implementations on FPGAs. These willbe discussed in detail in the subsequent section. From thesurvey of the various approaches it is evident that software-based approaches remain the dominant platform forresearchers in this area. Hardware implementations areeither trying to create high speed implementations of asmall number of biologically plausible neurons to assistneuroscientists explore the properties of real neurons; oralternatively researchers are investigating hardware plat-forms to enable computational applications to be accom-modated. In general, the research is concentrating on twoaspects: signal representation/efficient architectures andlarge-scale implementations. The following section high-lights an example for each of these areas.

Fig. 2 it is shown that incoming spatio-temporal spikesare accumulated and once the total number of spikesexceeds a threshold an output spike is generated. In orderto make the membrane more biologically plausible randomvalues are included in the membrane potential.

In this work the authors focused on an investigation of afully parallel implementation of the integrate-and-firemodel on a Xilinx Virtex-II Pro FPGA device (XC2VP50).The model was simulated first in software in order tovalidate the functionality before being implemented onhardware. The simulations were based on discrete solverand employed a variable time step with maximum step sizeof 0.125ms. The only drawback of such an architecture isthat the accuracy is inferior to that of fully digitalarithmetic operation. The algorithm was tested in MA-TLAB and the Simulink/Xilinx system generator toolboxwas used to graphically design the architecture and

simulate the timing and behaviour of the model. Oneadvantage of using Simulink is that it is possible to quicklybuild up models from libraries of pre-built blocks. Xilinxlibraries are used within the system generator whichcontains bit-true and cycle-accurate models of theirFPGAs and are plugged into Simulink. HDL code is thengenerated and behavioural simulations take place and thefunctionality of the design is verified.Based on this investigation, the resource estimation

usage is shown in Table 1 while the capabilities in terms ofnetwork size that can be implemented on the XC2VP50FPGA device are shown in Table 2.

3. Implementation approaches

Considerable research has been undertaken in develop-ing an understanding of the behaviour of a biologicalneuron. However, there are less research results availableon how large arrays of these interconnected neuronscombine to form powerful processing arrays. The HHspiking neuron model [30] is representative of thecharacteristics of a real biological neuron. The modelconsists of four coupled non-linear differential equationswhich are associated with time consuming softwaresimulations and would incur high ‘‘real-estate’’ costs when


a hardware implementation is targeted. Thus hardwareimplementations demand the application of a simplerneuron model. The level of abstraction has a significanteffect in the network sizes that can be implemented asdiscussed in the following sections.

3.1. Large scale implementations using a multiplier-less

strategy

The basic form of the integrate-and-fire neuron model isa relatively simple computational model and can thereforebe efficiently implemented on FPGA hardware. Mathema-tically it can be described as:

tmqu

qt¼ �uðtÞ þ RIðtÞ. (1)

This equation describes the effect on membrane potentialu over time where tm is the membrane time constant inwhich voltage leaks away and a spike is generated whenvoltage u crosses the threshold. Using this model it ispossible to completely omit multipliers required forsynapse/neuron computation [18]. An overview of theproposed architecture is shown in Fig. 2, where the inputsare spike trains much like real neurons. The circuitryrequired for this spike generation is shown in Fig. 3. Inputsignals are encoded as a pulse count and spatio-temporalsynapses are accumulated in a membrane. In Fig. 3, thesynapses are modelled with AND gates and in order toinduce non-linearity in the membrane a random noise isgenerated into the membrane potential. An output spikeis generated once the membrane potential exceeds thethreshold.

The model was implemented using the Simulink/Xilinxsystem generator toolbox and the required resourcesdetermined. The approach requires 8 slices for a 1:1implementation (1 synapse per neuron), 80 slices for a 10:1implementation (10 synapses per neuron) and 800 slices for100:1 (100 synapses per neuron) implementation for this

Fig. 3. Hardware architecture for

model. In Fig. 1, random fixed weights are generated andcompared to achieve the required synaptic strength. Thestrength of the input signal is encoded by number of pulses.A synaptic weight is given according to the strength of theincoming signal which is encoded as number of pulses.According to the resource utilisation it is possible to

implement 3936 fully parallel integrate and fire neurons onthis device with a 1:1 neuron to synapse ratio. As will beseen in the following section the implementation of aneuron model that requires multipliers has a significantimpact on the network size that can be implemented. Forexample, the inclusion of a multiplier would see the numberof neurons that could be implemented being generallylimited to the number of embedded multipliers available onthe FPGA device, in the case of the XC2VP50 this figure is232. Further additional multipliers can be generated fromthe available FPGA logic, however, this is area intensiveand rapidly consumes the remainder of the deviceresources. Thus the approach detailed in this sectionpresents significant benefits for a fully parallel implementa-tion approach.It is also worth mentioning that the selection of the bit

resolution plays an important role in the implementation ofthese models, as the resolution is reduced the behaviour ofthe model changes, it is therefore important that thedesigner should make a reasonable compromise betweenperformance and logic utilisation. For the implementationdetailed in this section a fixed point coding scheme wasused, 16 bits were used to represent the membrane voltage(8 bits for the fractional component) and 8 bits were usedfor the network weights (6 bits for the fractionalcomponent).

3.2. Large-scale implementations using a time-division

multiplexing strategy

The figures in the previous section illustrate thatalthough moderately large-scale networks are viable using

multiplier-less implementation.

ARTICLE IN PRESS

Table 3

I&F neuron model parameters

Param Common Pyramidal Inhibitory Description

cm 8 nF/mm2 – – Membrane capacitance

El �70mV – – Membrane reversal

potential

vth �59mV – – Threshold voltage

vreset �70mV – – Reset voltage

tref 6.5ms – – Refractory period

tjdelay

0.5ms – – Propagation delay

Gl – 1 uS 1 uS Membrane leak

conductance

A – 0.03125mm2

0.015625mm2

Membrane surface area

Ejs

– 0mV �75mV Reverse potential of

synapse

tjs

– 1ms 4ms Synapse decay time

Table 4

SNN logic requirements for time-multiplexed implementation

SNN component Slices Embedded multipliers

Synapse 33 0

STDP Synapse 106 0

Neuron 63 1


this approach, it is often desirable to simulate largernetwork architectures. Also there is little correlationbetween the reduced complexity model and the biologicallyrealistic HH neuron model. To enable the underlyingcharacteristics and responses of a more biologicallyplausible neuron model to be investigated and exploited,a conductance-based version of the integrate & fire (I&F)neuron model was therefore also targeted. A comparisonbetween this model implemented on an FPGA platformand a software simulation of the HH neuron model can befound in a previous publication by the authors, where itcan be seen that the conductance based I&F modelprovides similar behavioural characteristics to that of thebiologically plausible HH model [20]. The conductancebased I&F model used consists of a first order differentialequation where the rate of change of the neuron membranevoltage, v, is related to the membrane currents. The modelcan be described as

cm

dvðtÞ

dt¼ glðEl � vðtÞÞ þ

Xj

wjgjsðtÞ

AðEj

s � vðtÞÞ. (2)

For a given synapse j, an action potential (AP) event atthe presynaptic neuron at time tap triggers a synapticrelease event at time tap þ t

jdelay, a discontinuous increase in

the synaptic conductance

gjsðtap þ t

jdelay þ dtÞ ¼ gj

sðtap þ tjdelayÞ þ qj

s, (3)

otherwise gjsðtÞ is governed by

dgjsðtÞ

dt¼�1

tjs

gjsðtÞ. (4)

The forward Euler integration scheme with a time step ofdt ¼ 0.125ms was used to solve the model equations.Using this method differential equation (2) becomes

vðtþ dtÞ ¼ vðtÞ þ1

cððglðEl � vðtÞÞ

þX

j

wjgjsðtÞ

AðEj

s � vðtÞÞÞÞdt ð5Þ

and differential equation (4) becomes

gjsðtþ dtÞ ¼ gj

sðtÞ þ ð�1

ti

gjsðtÞÞdt. (6)

A description of the parameters from the aboveequations and the values for the implementation can befound in Table 3.

A hardware implementation of the model was realisedusing a fixed point coding scheme; 18 bits were used torepresent the membrane voltage (Eq. (5)), 10 bits of whichcorrespond to the fractional component. The synapticconductance (Eq. (6)), was implemented using 12 bitprecision for the fractional component. Multiplicand anddivisor parameters were chosen as powers of 2 so that theycould be implemented using binary shift, thus utilising asmaller amount of logic than what would be required by afull multiplier or divider. A biologically plausible STDP

algorithm, based on the Song and Abbott approach, wasimplemented in hardware to train the network [70]. Eachsynapse in a SNN is characterized by a peak conductance q

(the peak value of the synaptic conductance following asingle pre-synaptic action potential) that is constrained tolie between 0 and a maximum value qmax. Every pair of pre-and post-synaptic spikes can potentially modify the valueof q, and the changes due to each spike pair are continuallysummed to determine how q changes over time. Thesimplifying assumption is that the modifications producedby individual spike pairs combine linearly. A pre-synapticspike occurring at time tpre and a post-synaptic spike attime tpost modify the corresponding synaptic conductanceby q-q+qmax F(Dt), where Dt ¼ tpre+tpost and F(Dt) isdefined by

F ðDtÞ ¼Aþ expðDt=tþÞ; if Dto0;

�A� expðDt=t�Þ; if DtX0:

((7)

The time constants t+ and t� determine the ranges ofpre- to post-synaptic spike intervals over which synapticstrengthening and weakening are significant, and A+ andA_ determine the maximum amount of synaptic modifica-tion in each case. The logic requirements to incorporateeach SNN component on an FPGA are shown in Table 4.As discussed in the previous section, using a fully parallel

implementation for this model, the number of neurons thatcan be implemented on an FPGA is generally limited to thenumber of embedded multipliers provided. Further, addi-tional multipliers can be generated from the available


FPGA logic. However, this is area intensive and rapidlyconsumes the remainder of the resources, leaving little logicto implement STDP synapses. The largest device in theXilinx Virtex II series of FPGAs is the XC2V8000 devicewhich consists of 46,592 slices and 168 embedded multi-pliers. Therefore using an STDP synapse to neuronconnection ratio of 1:1 a total of 168 neurons and 168STDP synapses could be implemented on a XC2V8000device (using approximately 60% of the available resourcesfor functional circuitry). Increasing the connection ratio ofSTDP synapses to neurons to a more realistic ratio of 10:1enables a network consisting of 41 neurons and 410synapses to be implemented. Finally, using a morebiologically plausible ratio of 100:1 indicates that thelargest size network that could be implemented would be 4neurons and 400 STDP synapses.

Utilising this fully parallel implementation with a clockspeed of 100MHz and a Euler integration time step of0.125ms per clock period, a 1 s real time period ofoperation could be completed in 0.8ms. This provides aspeed up factor of 12 500 compared to real time processing.This compares very favourably with a software implemen-tation where the data are processed serially and can takehours to run a several second real time simulation. Thespeed up factor indicates that there are possibilities forexamining speed/area trade-offs and determining alterna-tive solutions that will lead to larger network capabilities.By partitioning the design between hardware and software

Fig. 4. Flow diagram for time-m

Fig. 5. MicroBlaze s

a compromise can be made between the size of networkthat can be implemented and the computing speed. Oneapproach is to use a controller that can time-multiplex asingle neuron or synapse, which reduces the amount oflogic required but increases computation time. A flowdiagram for this process is shown in Fig. 4.The program running on the processor loops through the

number of neurons/synapses to be computed. It reads thepast time steps’ values for the neuron/synapse and sendsthem to the hardware component as inputs. The result ofthe hardware computation is read back and then written toRAM such that it can be used for the next time stepcomputation. This approach has been adopted and asystem has been developed which utilises the XilinxMicroBlaze soft processor core [78]. MicroBlaze is a 32-bit soft processor core featuring a reduced instruction setcomputer (RISC) architecture with Harvard-style separate32-bit instruction and data buses. Synapse and neuronblocks are implemented in the logic along with theprocessor cores and can be accessed to perform customSNN instructions for the processor. An overview of thesystem developed by the authors is shown in Fig. 5.The main component in the system is the MicroBlaze

soft processor core. The program running on this processoris contained in the local memory bus (LMB) block RAM(BRAM) and is accessed through the LMB BRAMcontroller. The neuron/synapse/STDP blocks are imple-mented in the FPGA logic and accesses by the processor

ultiplexed implementation.

ystem overview.

ARTICLE IN PRESS

Fig. 6. Multiple processor system.

Table 5

Network size capabilities for time-multiplexed implementation on

BenNuey platform

Neuron:Synapse ratio Neurons Synapses

Parallel 1:1 1174 1174

1:10 176 1760

1:100 18 1800

1:1000 2 2000

Time-multiplexed 1:1 18,000,000 18,000,000

1:10 2,571,428 25,714,280

1:100 268,656 26,865,600

1:1000 26,986 26,986,000


using the fast simplex links (FSL) interface which areunidirectional point-to-point communication channelbuses. The NN data for the network are then held inSRAM which is external to the FPGA device. Thismemory is accessed over the on-chip peripheral bus(OPB) via the OPB external memory controller (EMC).As the memory is synchronous a clock de-skew configura-tion using two digital clock managers (DCM) blocks with afeedback loop ensures that the clock used internal to theFPGA and that for clocking of the static random accessmemory (SRAM) chips is consistent. Additional memory isalso made available to the processor in the form ofadditional BRAM which is also accessed over the OPB bus.Internal status of the network such as the synapse weightsor output spike trains can be displayed on a host PC via theserial port which is provided by the OPB user asynchro-nous receive transmit (UART). Finally the OPB MDMprovides an interface to the joint test action group (JTAG)debugger allowing the system to be debugged if problemsarise during the development of example networks. Multi-ple instances of the soft processor core can be instantiatedon a single FPGA; thus a degree of parallelism can bemaintained in the network while a number of FPGAs canalso be designated to provide additional intra-chipparallelism. Thus the system can be extended by increasingthe number of processors as shown in Fig. 6.

The FPGA implementation for the experiments detailedin this paper took place on the BenNuey platformdeveloped by Nallatech which can accommodate up toseven FPGAs [87]. At present, the system utilises two ofthese FPGAs, however, this is scalable in that theadditional FPGAs in the system can be configured withadditional processors designated to operate in parallel. Thesize of network that can be implemented with thisapproach is limited to the amount of memory availableon the hardware platform. The current configuration of theBenNuey system includes a total of 216MB off chipSRAM. Each neuron requires 4 bytes while each STDP

synapse requires 8 bytes. Therefore Table 5 indicates thesize of network that can be implemented for variousconnection strategies.In order to test the system described above, a spiking

network designed to perform co-ordinate transformationwas implemented. This application was initially conceivedas part of the SenseMaker project which aims to create ahardware system capable of merging multi-sensory data ina biologically plausible manner in order to create a systemperceptually aware, to some degree, of its environment [67].Co-ordinate transformation is used to convert arm anglesof a haptic sensor to (x,y) co-ordinates which are in turnused to generate a haptic image representation of thesearch space. These co-ordinates are then used to generatea haptic image representation of the search space.A biologically plausible strategy was developed based onthe principle of the basis function network proposed byDeneve et al. [12] and was further extended by incorporat-ing the STDP algorithm to train the network to performboth one-dimensional (1D) and two-dimensional (2D) co-ordinate transformation.For 1D co-ordinate transformation an input layer with

100 neurons is fully connected to the output layer withsynapses for which weights are determined by STDP as

ARTICLE IN PRESS

Fig. 7. SNN trained by STDP for 1D co-ordinate transformation.

Table 6

Simulation performance

Network PC (Matlab) FPGA 4 mBlaze

1D Network (100� 100� 100) 2917 s 27.2 s

1D Network (1400� 1400� 1400) 454,418 s 4237 s

L.P. Maguire et al. / Neurocomputing 71 (2007) 13–29 23

shown in Fig. 7. The training layer with 100 neurons isconnected to the output layer with fixed weights within areceptive field. The output layer receives spikes from boththe input and the training layers during the training stage.Only the input layer is used during the testing stage.

To test the maximum capability of this system the largestnetwork possible was also trained on the FPGA system.The network structure consisted of 1400 input synapses,training synapses and output synapses, which was dividedacross 4 processors with each responsible for 350 of theseoutput neurons. In total there are 1,964,200 synapses and4200 neurons for this network. The times taken to performa 1 s real time simulation using the approaches above areshown in Table 6.

The multi-processor FPGA approach achieves a perfor-mance improvement factor of 107.25 over the Matlabsimulation which was implemented on a representative2GHz Pentium 4 PC.

The system was further appraised by implementing a 2Dversion of the co-ordinate transformation application [75]that transforms arm angles (y,F) to position co-ordinates(X,Y). Fig. 8 presents the network topology used for thisapplication which involves two input layers for the networkinputs (y,F) each consisting of 40 neurons. Each neuron inthe y and F layers is then connected to an intermediate 2Dlayer (40� 40) within a vertical field and a horizontal field

with fixed synapse strength, respectively, as shown in thefigure. This intermediate layer is in turn connected bySTDP synapses to two output layers of 80 neuronscorresponding to the network outputs (X,Y). Also con-nected to these output layers, in a similar manner to the 1Dnetwork, are independent training layers for the two outputlayers. As in the 1D network a training signal applied witha corresponding input signal will cause the weights betweenthe intermediate and output layers to be reinforced bySTDP learning and the network is thus able to learn how toperform 2D co-ordinate transformation. The trainingsignal for the X and Y output layers is a Gaussiandistributed spike train centred on a neuron which iscalculated as follows:

X ¼ L½cosðyÞ þ cosðyþ FÞ�,

Y ¼ L½sinðyÞ þ sinðyþ FÞ�.

ARTICLE IN PRESS

Training signal X= L[cos( )+cos( + )]

Training signalY= L [sin( )+sin( + )]

X Neuron#

+40 80

. .

0 40

. .

-40 1

y-RF

x-RF

Y Neuron#

+40 80

. .

0 40

. .

-40 1

� Neuron#

360° 40

. .

180° 20

. .

0° 1

Neuron#

360° 40

. .

180° 20

. .

0° 1

� � Φ

� � Φ

Fig. 8. SNN topology for 2D co-ordinate transformation.


A plot of the input and output spike trains for a networkafter training is completed and is shown in Fig. 9. Here thesolid lines on the output spike train plots indicate thedesired response and it can be seen that the output spiketrain is centred on this line.

The time required to perform a 1 s real time simulationfor both the PC and FPGA based approaches are shown inTable 7. It can be seen that again the FPGA-basedapproach is significantly faster than the PC-based simula-tion.

However, it is evident that further improvements of theFPGA-based implementations would be required toachieve real-time performance. A more efficient approachwould be to only compute neuron and synapse outputswhen necessary, i.e. an event-driven approach when inputactivity for that component is present much like thesoftware SNN simulators such as neuron. This strategy ispresented in [21] and clearly highlights the benefitsprovided by this approach. Table 8 summarises theperformance improvements that are obtained using anevent-based approach. Using an event-based approach,however, an important consideration is that the computa-tion time is now proportional to the number of spike eventsthat must be processed, i.e. the higher the frequency ofinput spikes the longer the computation time will be. Forthe experiments described, a spike event was beingpresented on average every 2.083ms of the simulation,

therefore the combined input frequency presented to thenetwork equated to 480Hz. From this it can be deducedthat the system is able to process 480 input spike events in0.011 s which equates to approximately 43,636 spike eventsper second. The results presented are for a multi-processorapproach, i.e. 4 mBlaze processors are designated to operatein parallel with each processor being responsible for 25output neurons. At present the networks implementedusing this multi-processor approach have been single layerfeed forward with the processors effectively workingindependently with no communication between them. Itis acknowledged that further consideration of the approachwill be required when implementing multilayer networks toensure synchronisation between the processors. This workis currently under investigation.

4. Discussion and remaining challenges

One of the immediate issues in the time-efficient FPGAimplementation of SNNs is the availability and usability ofan appropriate development environment. Understandingthe simulation tool always pays rich dividends particularlywith the system complexity associated with large scaleSNNs. Tool support is needed throughout the design flow,starting with capture of system level models all the waydown to detailed implementation. Design decisions atsystem level have the most impact on the final performance

ARTICLE IN PRESS

Fig. 9. Spike plots for 2D co-ordinate transformation.

Table 7

Performance of 2D network for 1 s real time simulation

Network PC (Matlab) FPGA 2 mBlaze

2D Network 135,000 s 5400 s

(E37.5 h) (E90min)

Table 8

Performance comparison for event based approach

Network PC (Matlab) FPGA (event based)

4mBlaze

1D Network

(100� 100� 100)

2917 s 0.011 s

L.P. Maguire et al. / Neurocomputing 71 (2007) 13–29 25

[17,35,45]. The previous sections have outlined that thedevelopment tools and the design flow process are availablefrom more traditional digital design and can be readilyadopted to SNN implementations on FPGAs. Forexample, Matlab and the Simulink/Xilinx system generatortoolbox have proved effective for the two differenttechniques to realise large-scale implementations.

This work has reported on two approaches by theauthors to realise large-scale implementations of anintegrate and fire neuron due to its inherent computationalsimplicity and yet relative biological plausibility. Bothapproaches are a compromise in terms of speed andaccuracy of a fully parallel implementation. The time-multiplexing approach while introducing a significantoverhead in terms of the multiplexing requirements stillprovides the engineer with a platform to potentially realisea very large-scale network of SNNs. In addition, theapproach presented facilitates further improvements asadditional processors can be employed to increase theparallelism in the implementation to obtain further speedimprovements. Furthermore, the trend towards the provi-sion of multi-FPGA platforms affords even furtherimprovements. The multiplier-less approach adopts analternative strategy and concentrates on design moreefficient architectures for each synapse and neuron in thenetwork. This has long been recognised as an importantstrategy for digital implementations as the realisation ofmultiplication function is expensive on the hardware logicresources of an FPGA. The approach presented doesprovide substantial improvements over the more tradi-tional design. Finally, the authors have just procured a


massively parallel FPGA platform [87] to further explorelarge-scale implementations of SNNs that will provide anequivalent computational performance of sub-regions ofthe brain. This work will be directed at image and speechprocessing applications.

However, despite the significant success of the workreported in this paper there are still considerable out-standing challenges in this area. For example, there hasbeen limited work reported in the literature comparing theeffectiveness of different spiking neural models for anFPGA implementation [35,36]. There has been a verydetailed summary of the differing spiking neural modelsavailable for engineers [32,33,41,77] although there stillneeds to be further investigations to determine theappropriateness of each model or indeed the appropriatecombination of each model. The human brain is notcomposed of only one single type of neuron but rather itcontains a range of neurons that have evolved due to thecomputational restrictions and requirements within eacharea. This reflects the relative immaturity of research inspiking neural networks compared to the more classicalapproaches. A major benefit of the time-multiplexedimplementation approach is that it provides a great dealof flexibility in terms of the underlying neuron model thatcan be employed. As discussed previously 32 bit wide FSLlinks are used to interface the processor to the network IPcomponents implemented in the FPGA logic. Due to thereconfigurable nature of the device the neuron modelemployed can easily be modified and a new bitstreamgenerated and downloaded. More attractively, however,only four of the eight available FSL links are currentlyutilised with the neuron and synapse components consum-ing one link each while two links are allocated to the STDPcomponent. It is therefore possible to add a further fourdistinct neuron models via the remaining unused links.Alternatively as the neuron components currently uses only24 out of the available 32 bits (18 bits for the membranevoltage, 6 bits for the refractory value), address decodingusing redundant bits could be utilised to access multipleneuron models implemented in the logic via a single FSLlink, 3 bits alone would enable eight distinct neuron modelsto be accessed. In both instances the selection of the neuronmodel is software controllable via the processor, this isimportant in that the implementation is not limited onutilising a single neuron model for the entire network, thereis the ability, if required, to employ varying neuron modelsin different regions of the network structure.

Similarly, the classical architectural design for NNsarranges neurons in fixed layers with communicationnormally restricted to the two nearest neighbouring layers.Although such a regular structure affords a number ofbenefits for hardware implementations the associatedinterconnect causes additional problems particularly asnetwork sizes increase. The effect is to introduce aninterconnect bottleneck in the design such that only limitedarchitectural designs will be viable which is a well-recognised limitation in the area. This restriction provides

the main motivation for time-multiplexed approaches toincrease network sizes. However, the human brain seems tohave overcome this problem via optimising regions forspecific functions such as for example the number ofregions in the range of visual pathway. However, withinand between each region there is still a high degree ofconnectivity. The limited understanding of such connectiv-ity strategies reflects the embryonic stage of neuroscienceresearch which currently focuses on a small number ofneurons [15].A similar but related problem is one of appropriate

training and learning algorithms. Neuroscientists haveestablished that local synaptic plasticity is realised via aHebbian mechanism such as the spike timing dependentplasticity rule [4,70]. However, this is a localised unsuper-vised learning mechanism and there is no indication of howthis may be exploited at the network level. In particular,there is a need to develop supervised approaches for thesebiological plausible networks that can be exploited incomputational examples. There are already reportedapproaches that use a variant of the classic backpropaga-tion algorithm [36,76] and an evolutionary approach [1,2].Both approaches, however, are very difficult to implementin real time on a reconfigurable platform and do notincorporate a Hebbian strategy. A major challenge forlarge-scale implementations of SNNs is thus to developnetwork training strategies that enable a supervisedapproach (to solve the range of computational problems)that also has a biological plausibility (that accommodatesthe localised plasticity characteristics). The final challengeis one of developing applications for the proposedarchitectures and training algorithms in solving engineeringproblems. A substantial part of the literature on SNNsinvolving the neuroscience research work is not directlyexploitable by engineers. The next few years will illustratethat SNNs will provide the same performance (accuracyand/or speed) but with less resources (e.g. memory in caseof software or device area in the case of FPGAs/ASICs)and thus outperform the classical engineering approaches.Potentially, this area can provide new solutions to unsolvedproblems.

5. Conclusion

This paper has provided a comprehensive review of thereported research on FPGA implementations of classicaland spiking neural networks on FPGAs. Spiking neuralnetworks have received considerable recent attention asengineers aim to form a more comprehensive under-standing of information processing in biological networks,and to investigate how this understanding could be used toimprove traditional information processing techniques.The authors have provided a brief summary of twoalternative approaches to realising large-scale implementa-tions of neural networks on FPGAs. The time-divisionmultiplexing approach exploits the higher computationalspeed of a hardware implementation, so that the resulting


design can be realised in real-time. The research has shownthat the real benefits are obtained when an event-basedstrategy is employed so as to minimise the unnecessaryprocessing overhead. The second approach considered amore fundamental question in terms of trying to optimisethe architecture in terms of the circuitry required to realisesynapses and neurons, identifying that a major bottleneckwas in the requirement for multipliers which consumeexcessive resources.

These developments have enabled the authors to realiselarge-scale networks of SNNs on FPGAs using generalpurpose design tools. However, the remaining challengesare also fundamental issues in terms of spiking neuralnetworks such as the identification of appropriate modelsand combination of models, interconnect and architecturalissues and finally training mechanisms that can be deployedon-chip so that the networks can be trained and continueto learn in real time. These questions are also the subject ofresearch in neuroscience as researchers in biology continueto strive to obtain a more comprehensive understanding ofthe mechanisms of the human brain. It is thus vitallyimportant that the recent trend of collaborative researchacross a range of disciplines continues and that such jointinteraction addresses these remaining challenges.

References

[1] A. Belatreche, L.P. Maguire, T.M. McGinnity, Advances in design

and application of spiking neural networks, Springer J. Soft

Comput.—Fusion Found. Methodol. Appl. 11 (3) (2007) 239–248.

[2] A. Belatreche, L.P. Maguire, T.M. McGinnity, Q. Wu, Evolutionary

design of spiking neural networks, J. New Math. Nat. Comput.

(World Scientific Publishing) 2 (3) (2006) 237–253.

[3] J.-L. Beuchat J.O. Haenni, E. Sanchez, ‘‘Hardware reconfigurable

neural networks,’’ in: 5th Reconfigurable Architectures Workshop

(RAW’98), Orlando, Florida, USA, 1998.

[4] G.Q. Bi, M.M. Poo, ‘‘Synaptic modifications by correlated activity:

Hebb’s postulated revisited,’’ Ann. Rev. Neurosci. 24 (2001) 139–166.

[5] J.J. Blake, T.M. McGinnity, L.P. Maguire, The implementation of

fuzzy systems, neural networks and fuzzy neural networks using

FPGAs, Inform. Sci. 112 (1) (1998) 151–168(18).

[6] N. Carlson, Foundations of Physiological Psychology, Simon &

Schuster, Needham Heights, MA, 1992.

[7] Y. Chen, S. Hall, L. McDaid, O. Buiu, P. Kelly, On the design of a

low power compact spiking neuron cell based on charge coupled

synapses, in: International Joint Conference on Neural Networks

(IJCNN 2006), Conference Proceedings, IEEE, Vancouver, Canada,

pp. 1511–1517.

[8] Y. Chen, S. Hall, L.J. McDaid, P.M. Kelly, ‘‘Silicon synapse for

hebbian learning application,’’ Proceedings of PREP, University of

Lancaster, 2005, pp. 71–73

[9] Y. Chen, S. Hall, L.J. McDaid, P.M. Kelly, ‘‘A silicon synapse based

on a charge transfer device for spiking neural network applications,’’

The 3rd International Symposium on Neural Networks, ISNN2006,

Chengdu, China, 2006.

[10] A. Delorme, J. Gautrais, R. VanRullen, S.J. Thorpe, ‘‘SpikeNET: a

simulator for modeling large networks of integrate and fire neurons,’’

Neurocomputing 26–27 (1999) 989–996.

[11] A. Delorme, S.J. Thorpe, SpikeNET: an event-driven simulation

package for modeling large networks of spiking neurons, Network.

Comput. Neural Syst. 14 (2003) 613–627.

[12] S. Deneve, P.E. Latham, A. Pouget, Efficient computation and cue

integration with noisy population codes, Nature Neuroscience 4

(2001) 826–831.

[13] J.G. Eldredge, ‘‘FPGA density enhancement of a neural network

through run-time reconfiguration,’’ Master’s Thesis, Department of

Electrical and Computer Engineering, Brigham Young University,

1994.

[14] A. Ferrucci, ‘‘Acme: a field-programmable gate array implementation

of a self adapting and scalable connectionist network,’’ Master’s

Thesis, University of California, Santa Cruz, 1994.

[15] S. Furber, S. Temple, A. Brown, ‘‘On-chip and inter-chip networks

for modelling large-scale neural systems,’’ in: IEEE Symposium on

Circuits and Systems, Kos, Greece, 2006.

[16] H. De Garis, M. Korkin, ‘‘The Cam-brain machine (CBM): an

FPGA based hardware tool which evolves a 1000 neuron net circuit

module in seconds and updates a 75 million neuron artificial brain for

real time robot control,’’ Neurocomputing J 42 (1–4) (2002).

[17] A. Ghani, T.M. McGinnity, L.P. Maguire, ‘‘Approaches to the

implementation of large scale spiking neural networks on FPGAS’’

Prep, Lancaster, UK, 2005.

[18] A. Ghani, T. Martin McGinnity, L.P. Maguire, ‘‘Area efficient

architecture for large scale implementation of biologically

plausible spiking neural networks on reconfigurable hardware,’’

FPL, 2006.

[19] B. Glackin, L.P. Maguire, T.M. McGinnity, Intrinsic and extrinsic

implementation of a bio-inspired hardware system, Inf. Sci. 161 (1–2)

(2004) 1–19.

[20] B. Glackin, T.M. McGinnity, L.P. Maguire, A. Belatreche, Q.X. Wu,

Implementation of a biologically realistic spiking neuron model on

FPGA hardware, JCIS 2005

[21] B. Glackin, T.M. McGinnity, L.P. Maguire, A. Wu, Q.X. Ghani, A.

Belatreche, ‘‘An event based large scale spiking neural network

simulator implemented on fpga hardware,’’ IEEE Neural Networks,

submitted for publication.

[22] B. Glackin, T.M. McGinnity, L.P. Maguire, Q.X. Wu, A. Belatreche,

‘‘A novel approach for the implementation of large scale spiking

neural networks on FPGA hardware,’’ in: Proceedings of IWANN

2005 Computational Intelligence and Bioinspired Systems, Barcelo-

na, Spain, June 2005, Springer, Berlin, 2005, pp. 552–563, ISBN 0302-

9743.

[23] N.H. Goddard, K.J. Lynne, T. Mintz, The Rochester Connectionist

Simulator, Technical Report 233 (revised), Computer Science

Department, University of Rochester, March 1988.

[24] E.L. Graas, E.A. Brown, R.H. Lee, ‘‘An FPGA-based approach

to high-speed simulation of conductance-based neuron models,’’

Neuroinf. 2 (2004) 417–435.

[25] C. Grassmann, J.K. Anlauf, ‘‘Fast digital simulation of spiking

neural networks and neuromorphic integration with SPIKELAB,’’

Int. J. Neural Syst. 9 (5) (1999) 473–478.

[26] G. Hartmann, G. Frank, M. Schaefer, C. Wolff, ‘‘SPIKE128K—an

accelerator for dynamic simulation of large pulse-coded networks,’’

in: MicroNeuro’97, 1997, pp. 130–139

[27] S. Haykin, Neural Networks. A Comprehensive Foundation, Second

ed., Q6 Prentice-Hall, NJ, Englewood cliffs, 1999.

[28] D.O. Hebb, ‘‘Organization of behavior,’’ Wiley, New York, 1949.

[29] H.H. Hellmich, H. Klar, ‘‘SEE: a concept for an FPGA based

emulation engine for spiking neurons with adaptive weights,’’ in: 5th

WSEAS International Conference on Neural Networks Applications

(NNA ‘04), Udine, Italy, 2004, pp. 930–935

[30] A.L. Hodgkin, A.F. Huxley, ‘‘A quantitative description of

membrane current and its application to conduction and excitation

in nerve,’’ J. Physiol. 117 (1952) 500–544.

[31] G. Indiveri, P. Verschure, ‘‘Autonomous vechicle guidance using

analog VLSI neuromorphic sensor,’’ in: Proceedings of the 7th

International Conference of Neural Networks, Springer, Berlin, 1997,

pp. 811–816.

[32] E.M. Izhikevich, ‘‘Simple model of spiking neurons,’’ IEEE Trans.

Neural Networks 14 (2003) 1569–1572.


[33] E.M. Izhikevich, ‘‘Which model to use for cortical spiking neurons?,’’

IEEE Trans. Neural Networks, 15 (2004) 1063–1070.

[34] A. Jahnke, U. Roth, H. Klar, ‘‘A SIMD/dataflow architecture for a

neurocomputer for spike-processing neural networks (NESPINN),’’

in: MicroNeuro’ 96 (1996) 232–237.

[35] S. Johnston, G. Prasad, L.P. Maguire, T.M. McGinnity, ‘‘Compara-

tive investigation into classical and spiking neuron implementations

on FPGAs,’’ in: Proceedings of International Conference on

Artificial Neural Network (ICANN), Poland, Warsaw, 2005 pp.

269–274

[36] S.P.J. Johnston, G. Prasad, L.P. Maguire, T.M. McGinnity, A.

Belatreche, ‘‘Investigation into the pragmatism of phenomenological

spiking neurons for hardware implementation on FPGAs,’’ in:

Proceedings of IEEE SMC UK-RI Chapter Conference, Derry,

2004 pp. 90–95

[37] M.I. Jordan, R.A. Jacobs, Hierarchical mixtures of experts and the

EM algorithm, Neural Computation 6 (1994) 181–214.

[38] S. Le Masson, A. Laflaquiere, D. Dupeyron, T. Bal, G. Le Masson,

Analog circuits for modeling biological neural networks: design and

applications, IEEE Trans. Biomed. Eng. 46 (6) (1999) 638–645.

[39] M.A. Lewis, R. Etinne-Cummings, A.H. Cohen, M. Hartmann,

‘‘Toward biomorphic control using custom aVLSI CPG chips,’’ in:

Proceedings of IEEE International Conference on Robotics and

Automation 2000.

[40] W.B Ligon III, S. McMillan, G. Monn, K. Schoonover, F. Stivers,

K.D. Underwood, ‘‘A re-evaluation of the practicality of floating

point operations on FPGAs’’ IEEE Symposium on FPGAs for

Custom Computing Machines, Los Alamitos, CA, 1998, IEEE

Computer Society Press, Silver Spring, MD, pp. 206–215.

[41] W. Maass C.M. Bishop (Eds.), Pulsed Neural Networks, MIT Press,

Cambridge, MA

[42] M.H. Martin, ‘‘A reconfigurable hardware accelerator for back-

propagation connectionist classifiers,’’ Master’s Thesis, University of

California, Santa Cruz, 1994.

[43] S. Maya, R. Reynoso, C. Torres, M. Arias Estrada, Compact spiking

neural network implementation in FPGA, in: R.W. Hartenstein, H.

Grunbacher (Eds.), FPL 2000, Lecturer Notes in Computer Science

1896, 2000, pp. 270–276.

[44] W.S. McCulloch, W. Pitts, ‘‘A logical calculus of the ideas immanent

in nervous activity,’’ BullMath. Biophy. 5 (1943) 15–133.

[45] S.Modi, P.R. Wilson, A.D. Brown, ‘‘Behavioral simulation of

biological neuron systems in systemC,’’ University of Southampton,

UK.

[46] T. Nordstrom, ‘‘Sparse distributed memory simulation on remap3,’’

Research Report TULEA 1991:16, Division of Computer Science and

Engineering, Lulea University of Technology, Sweden, 1991.

[47] T. Nordstrom, ‘‘Highly parallel computers for artificial neural

networks’’ Ph.D. Thesis (1995:162 f), Division of Computer Science

and Engineering, Lulea University of Technology, Sweden, 1995.

[48] T. Nordstrom, ‘‘On-line localized learning systems part ii—parallel

computer implementation,’’ Research Report TULEA 1995:02,

Division of Computer Science and Engineering, Lulea University of

Technology, Sweden, 1995.

[49] T. Nordstrom, B. Svensson, ‘‘Using and designing massively parallel

computers for artificial neural networks,’’ J. Parallel Distributed

Comput. 14 (1992).

[50] E.M. Ortigosa, A. Canas, E. Ros, P.M. Ortigosa, S. Mota, J. Dıaz,

Hardware description of multi-layer perceptrons with different

abstraction levels, Microprocessors and Microsystems 30 (7) (2006)

435–444.

[51] C. Panchev, S. Wermter, Spike-timing-dependent synaptic plasticity:

from single spikes to spike trains, Neurocomputing 58–60 (2004)

365–371.

[52] C. Panchev, S. Wermter, H. Chen, Spike-timing dependent compe-

titive learning of integrate-and-fire neurons with active dendrites, in:

Proceedings of the International Conference on Artificial Neural

Networks, Lecture Notes in Computer Science, Madrid, Spain,

Springer, 2002, pp. 896–901,

[53] Pandya, V., Areibi, S., Moussa, M., ‘‘A handel-c implementation of

the back-propagation algorithm on field programmable gate arrays,’’

in: International Conference on Reconfigurable Computing and

FPGAs, 2005

[54] A. Perez-Uribe, ‘‘Structure-adaptable digital neural networks’’ ,

Ph.D. Thesis, Logic Systems Laboratory, Computer Science Depart-

ment, Swiss Federal Institute of Technology-Lausanne, 1999.

[55] A. Perez-Uribe. Structure-adaptable digital neural networks. Ph.D.

Thesis, EPFL, 1999.

[56] S. Renaud-Le Masson, G. Le Masson, L. Alvado, S. Saıghi, J.

Tomas, ‘‘A neural simulation system based on biologically realistic

electronic neurons,’’ Inf. Sci. 161 (1–2) (2004) 57–69.

[57] D. Roggen, S. Hofmann, Y. Thoma, D. Floreano, Hardware spiking

neural network with run-time reconfigurable connectivity in an

autonomous robot, EPFL, Lausanne, Switzerland

[58] D. Roggen, S. Hofmann, Y. Thoma, D. Floreano, ‘‘Hardware

spiking neural network with run-time reconfigurable connectivity in

an autonomous robot,’’ NASA/DoD Conference on Evolvable

Hardware, July 2003.

[59] E.T. Rolls, M.J. Tovee, ‘‘Sparseness of the neuronal representation of

stimuli in the primate temporal visual cortex,’’ J Neurophysiology 73

(1995) 713–726.

[60] E. Ros, R. Agis, R. R. Carrillo, E. M. Ortigosa, Post-synaptic Time-

Dependent Conductances in Spiking Neurons: FPGA Implementa-

tion of a Flexible Cell Model, In: Proceedings of IWANN’03, Leature

Notes Computer Science 2687, Springer, Berlin, 2003, pp. 145–152.

[61] E. Ros, E.M. Ortigosa, R. Agis, R. Carrillo, M. Arnold. ‘‘Real-time

computing platform for spiking neurons (RT-spike),’’ IEEE Trans.

Neural Networks 17 (4) (2006) 1050–1063.

[62] U. Roth, A. Jahnke and H. Klar, ‘‘Hardware Requirements for

Spike-Processing Neural Networks,’’ From Natural to Artificial

Neural Computation (IWANN), J. Mira, F. Sandoval, (Eds.),

Springer, Berlin, 1995, pp.720–727.

[63] M. Schaefer, T. Schoenauer, C. Wolff, G. Hartmann, H. Klar,

U. Rueckert, Simulation of spiking neural networks—architectures

and implementations, Neurocomputing 48 (1) (2002) 647–679(33).

[64] J. Schemmel, K. Meier, E. Muller , ‘‘A new VLSI Model of neural

microcircuits including spike time dependent plasticity,’’ in: Proceed-

ings of the 2004 International Joint Conference on Neural Networks

(IJCNN’04), IEEE Press, New York, 2004, pp. 1711–1716.

[65] T. Schoenauer, S. Atasoy, N. Mehrtash, H. Klar, ‘‘Simulation of a

digital neuro-chip for spiking neural networks,’’ in: Proceedings of

the IEEE International Joint Conference on Neural Networks,

IJCNN2000, Como, Italy, 2000.

[66] T. Schoenauer, N. Mehrtash, A. Jahnke, H. Klar, ‘‘MASPINN: novel

concepts for a neuro-accelerator for spiking neural networks,’’ in:

VIDYNN’98, 1998.

[67] Sensemaker, IST-2001-34712, /http://isel.infm.ulst.ac.uk/sense-

maker/S.

[68] M. Skrbek, ‘‘Fast neural network implementation,’’ Neural Network

World 9 (5) (1999) 375–391.

[69] L.S. Smith, Spiking neural simulator, Department of Computing

Science and Mathematics, University of Stirling FK9 4LA, Scotland,

UK

[70] S. Song, K.D. Miller, L.F. Abbott, Competitive Hebbian learning

though spike-timing dependent synaptic plasticity, Nat. Neurosci. 3

(2000) 919–926.

[71] M. Taveniku, A. Linde, ‘‘A reconfigurable SIMD computer for

artificial neural networks,’’ Licentiate Thesis No. 189L, Department

of Computer Engineering, Chalmers University of Technology,

Goteborg, Sweden, 1995.

[72] A. Upegui, C.A. Pena-Reyes, E. Sanchez. ‘‘A methodology for

evolving spiking. Neural—network topologies on line using

partial dynamic reconfiguration,’’ in: ICCI-International Conference

on Computational Inteligence. Medellin, Colombia. November

2003.

[73] A. Upegui, C.A. Pena-Reyes, E. Sanchez, A functional spiking

neuron hardware oriented model, IWANN 1 (2003) 136–143.

http://isel.infm.ulst.ac.uk/sensemaker/

http://isel.infm.ulst.ac.uk/sensemaker/


[74] C. Wolff, G. Hartmann, U. Ruckert. ‘‘ParSPIKE—a parallel dsp-

accelerator for dynamic simulation of large spiking neural networks,

in: MicronEuro, 1999, p. 324.

[75] Q.X. Wu, T.M. McGinnity, L.P Maguire, A. Belatreche, B. Glackin,

Adaptive Co-ordinate transformation based on a spike timing-

dependent plasticity learning paradigm, Leature Notes in Computer

Science, 3610 Springer, Berlin, 2005, 420–429.

[76] Q.X. Wu, T.M. McGinnity, L.P. Maguire, B. Glackin, A. Belatreche,

‘‘Learning under weight constraints in networks of temporal encoding

spiking neurons,’’ Int. J. Neurocompu. accepted for publication.

[77] G. Wulfram, W. Werner, ‘‘Spiking Neuron Models,’’ Cambridge

University Press, Cambridge, 2002.

[78] Xilinx, MicroBlaze Processor Reference Guide, UG081 (v5.1) 2005,

/http://www.xilinx.com/S.

[79] X. Yao, ‘‘Evolving artificial neural networks,’’ Proc. IEEE 87 (9)

(1999) 1423–1447.

[80] J. Zhu, P. Sutton, FPGA implementation of neural networks—a

Survey of a decade of progress. In: P.Y.K. Cheung, G.A.

Constantinides, J.T. de Sousa, (Eds.), 13th International Conference

on Field-Programmable Logic and Applications (FPL 2003),

September, Lisbon, Portugal, pp. 1062–1066.

[81] /http://neuron.duke.edu/S[82] /http://www.genesis-sim.org/GENESIS/S[83] /http://www-ra.informatik.uni-tuebingen.de/SNNS/S[84] /http://darwin.ucd.ie/milligan/S[85] /http://amygdala.sourceforge.net/S[86] /http://www.ymer.org/research/bnntoolbox.htmS[87] /http://www.nallatech.com/S

Liam P. Maguire received the M.Eng. and Ph.D.

degrees in Electrical and Electronic Engineering

from the Queen’s University of Belfast, Belfast,

UK, in 1988 and 1991, respectively. He is a

Reader and Acting Head of School of the School

of Computing and Intelligent Systems, University

of Ulster, Derry, UK He is also a Member of the

Intelligent Systems Engineering Laboratory at

the University of Ulster. His current research

interests are in two primary areas: fundamental

research in bio-inspired intelligent systems (such as the development of

computational effective spiking neural networks) and the application of

existing intelligent techniques in different domains (industrial process

control, augmentative technologies and finally other disciplines such as

Supply Chain Management). He is the author or co-author of over 100

research papers.

Martin McGinnity has been a member of the

University of Ulster academic staff since 1992,

and holds the post of Professor of Intelligent

Systems Engineering within the Faculty of En-

gineering. He has a first class honours degree in

physics, and a doctorate from the University of

Durham, is a Fellow of the IET, member of the

IEEE, and a Chartered Engineer. He has 27 years

experience in teaching and research in electronic

and computer engineering, leads the research

activities of the Intelligent Systems Engineering Laboratory at the Magee

campus of the University, and is currently Acting Associate Dean of the

Faculty of Engineering, with responsibility for Research and Develop-

ment, and Knowledge and Technology Transfer. His current research

interests relate to the creation of intelligent computational systems in

general, particularly in relation to hardware and software implementations

of neural networks, fuzzy systems, genetic algorithms, embedded

intelligent systems utilising re-configurable logic devices and bio-inspired

cognitive systems.

Brendan Glackin received the 1st Class Honours

degree in electronics and computing from the

University of Ulster, Magee, UK and is currently

pursuing the Ph.D. degree at this university. He is

a Research Associate in the School of Computing

and Intelligent Systems, University of Ulster,

Derry, UK and is also a member of the Intelligent

Systems Engineering Laboratory of the same

university. His current research interests relate to

the implementation in embedded systems of bio-

inspired and hybrid intelligent systems.

Arfan Ghani received the degree of Bachelor of

Electronic engineering with distinction from

NED Engineering University, Pakistan and

M.Sc. in Computer Systems Engineering from

the Technical University of Denmark. He has

more than three years industrial experience with

M/S SIEMENS Pakistan, M/S VITESSE Semi-

conductors Denmark, M/S Microsoft Denmark

and Intel Research Cambridge, UK. Currently he

is a Doctoral candidate at the Intelligent Systems

Engineering Laboratory, University of Ulster, UK. His research interests

include design and implementation of bio inspired architectures on

reconfigurable platforms (FPGAs), hardware implementation of neural

networks, VLSI system design and digital signal processing.

Ammar Belatreche received the degree of ‘In-

genieur d’Etat’ (B.Eng.) in computer systems

from the National Institute of Informatics (INI),

Algiers, Algeria in 1998. He is currently a lecturer

at the School of Computing and Intelligent

Systems, University of Ulster, UK. He is also a

member of the Intelligent Systems Engineering

Laboratory at the University of Ulster where he

is undertaking a PhD in computer science. His

research interests include intelligent systems,

spiking neural networks, artificial neural networks, evolutionary comput-

ing, pattern recognition, and machine learning and computer vision.

Jim Harkin is a Lecturer in the School of

Computing and Intelligent Systems at the Uni-

versity of Ulster, UK. He holds a B.Tech. (1996),

M.Sc. (1997) and Ph.D. (2001) in electronic

engineering from the University of Ulster, and

is a member of the IET. He is also a member of

the Intelligent Systems Engineering Laboratory

at the University of Ulster and his current

research interests relate to the design of intelli-

gent embedded systems to support self-repairing

capabilities; and FPGA-based hardware implementation strategies for

spiking neural networks. He has published 30+ articles in peer-reviewed

journals and conferences.

http://www.xilinx.com/

http://neuron.duke.edu/

http://www.genesis-sim.org/GENESIS/

http://www-ra.informatik.uni-tuebingen.de/SNNS/

http://darwin.ucd.ie/milligan/

http://amygdala.sourceforge.net/

http://www.ymer.org/research/bnntoolbox.htm

http://www.nallatech.com/

Challenges for large-scale implementations of spiking ... · PDF filein their architecture...

Documents

Transcript of Challenges for large-scale implementations of spiking ... · PDF filein their architecture...