Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive...

10
1 Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty, Deboleena Roy and Kaushik Roy, Fellow, IEEE Abstract—The advances in the field of machine learning using neuromorphic systems have paved the pathway for ex- tensive research on possibilities of hardware implementations of neural networks. Various memristive technologies such as oxide-based devices, spintronics and phase change materials have been explored to implement the core functional units of neuromorphic systems, namely the synaptic network, and the neuronal functionality, in a fast and energy efficient manner. However, various non-idealities in the crossbar implementations of the synaptic arrays can significantly degrade performance of neural networks and hence, impose restrictions on feasible crossbar sizes. In this work, we build mathematical models of various non-idealities that occur in crossbar implementations such as source resistance, neuron resistance and chip-to-chip device variations and analyze their impact on the classification accuracy of a fully connected network (FCN) and convolutional neural network (CNN) trained with standard training algorithm. We show that a network trained under ideal conditions can suffer accuracy degradation as large as 59.84% for FCNs and 62.4% for CNNs when implemented on non-ideal crossbars for relevant non-ideality ranges. This severely constrains the sizes for crossbars. As a solution, we propose a technology aware training algorithm which incorporates the mathematical models of the non-idealities in the standard training algorithm. We demonstrate that our proposed methodology achieves significant recovery of testing accuracy within 1.9% of the ideal accuracy for FCNs and 1.5% for CNNs. We further show that our proposed training algorithm can potentially allow the use of significantly larger crossbar arrays of sizes 784×500 for FCNs and 4096×512 for CNNs with a minor or no trade-off in accuracy. Index Terms—Neural Networks, Memristive crossbar, back- propagation, neuromorphic, image recognition. I. I NTRODUCTION R ECENT developments in computational neuroscience have resulted in a paradigm shift away from Boolean computing in sequential von-Neumann architectures as the research community strives to emulate the functionality of the human brain on neurocomputers. Although extensive re- search has been done to accelerate computational functions such as matrix operations on general-purpose computers, the parallelism of the human brain has remained elusive to von- Neumann architecture, thus engendering high hardware cost and energy consumption [1]. This has resulted in the ex- ploration of non-von Neumann architectures with ‘massively parallel operations in-memory’, thus avoiding the overhead I Chakraborty, D Roy and K Roy are with the Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, 47906 USA e-mail: [email protected];[email protected];[email protected] cost of exchanging data between memory and processor. Espe- cially with the recent advances in machine learning in various cognitive tasks such as image recognition, natural language processing etc, the search for such energy-efficient ‘in-memory computing’ platforms has become quintessential. Although standardized hardware implementations of neuromorphic sys- tems like CAV IAR [2], IBM TrueNorth [3], SpiNNaker [4] have primarily been dominated by CMOS technology, the memristor-based non-volatile memory (NVM) technology [5]–[9] has naturally evolved into an exciting prospect. To that end, various technologies such as spintronics [10], oxide- based memristors [11], [12], phase change materials (PCM) [13], etc., have shown promising progress in mimicking the functionality of the core computational units of a neural network, i.e., neurons and synapses. The core functionality of a neuromorphic system is a parallelized dot product between the inputs and the synaptic weights [14]. This has been demonstrated to be efficiently realized by a dense resistive crossbar array [15], [16]. The ability to naturally compute matrix multiplications makes crossbar arrays the most convenient way of implementing neu- romorphic systems. However, real crossbars could suffer from various non-idealities including device variations [17], [18], parasitic resistances, non-ideal sources, and neuron resistances. Although neural networks are generally robust against small variations in the crossbar, the aforementioned technological constraints can severely impact accuracy of recognition tasks as well as restrict the crossbar size. Several techniques such as redundancy schemes [19], technology optimization [20] and modified training algorithms [21]–[23] have been explored for both on-chip and ex-situ learning to mitigate specific non-ideal effects such as IR drops, synaptic device variations. However, mathematical modeling of non-idealities and its incorporation in standard training algorithm needs further exploration. In this work, we analyze the impact of non-idealities such as source resistance, neuron resistances, and synap- tic weight variations in hardware implementations of neu- romorphic crossbars. We show how such non-idealities can significantly degrade the accuracy when traditional training methodologies are employed. The presence of these parasitic elements also severely limits the crossbar sizes. As a solution, we propose an ex-situ technology aware training algorithm that mathematically models the aforementioned non-idealities and accounts for the same in the traditional backpropagation algorithm. Such a technique not only preserves the accuracy of an ideal network appreciably but also allows us to use larger arXiv:1711.08889v1 [cs.ET] 24 Nov 2017

Transcript of Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive...

Page 1: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

1

Technology Aware Training in MemristiveNeuromorphic Systems based on non-ideal Synaptic

CrossbarsIndranil Chakraborty, Deboleena Roy and Kaushik Roy, Fellow, IEEE

Abstract—The advances in the field of machine learningusing neuromorphic systems have paved the pathway for ex-tensive research on possibilities of hardware implementationsof neural networks. Various memristive technologies such asoxide-based devices, spintronics and phase change materialshave been explored to implement the core functional units ofneuromorphic systems, namely the synaptic network, and theneuronal functionality, in a fast and energy efficient manner.However, various non-idealities in the crossbar implementationsof the synaptic arrays can significantly degrade performanceof neural networks and hence, impose restrictions on feasiblecrossbar sizes. In this work, we build mathematical models ofvarious non-idealities that occur in crossbar implementationssuch as source resistance, neuron resistance and chip-to-chipdevice variations and analyze their impact on the classificationaccuracy of a fully connected network (FCN) and convolutionalneural network (CNN) trained with standard training algorithm.We show that a network trained under ideal conditions cansuffer accuracy degradation as large as 59.84% for FCNs and62.4% for CNNs when implemented on non-ideal crossbars forrelevant non-ideality ranges. This severely constrains the sizes forcrossbars. As a solution, we propose a technology aware trainingalgorithm which incorporates the mathematical models of thenon-idealities in the standard training algorithm. We demonstratethat our proposed methodology achieves significant recovery oftesting accuracy within 1.9% of the ideal accuracy for FCNs and1.5% for CNNs. We further show that our proposed trainingalgorithm can potentially allow the use of significantly largercrossbar arrays of sizes 784×500 for FCNs and 4096×512 forCNNs with a minor or no trade-off in accuracy.

Index Terms—Neural Networks, Memristive crossbar, back-propagation, neuromorphic, image recognition.

I. INTRODUCTION

RECENT developments in computational neurosciencehave resulted in a paradigm shift away from Boolean

computing in sequential von-Neumann architectures as theresearch community strives to emulate the functionality ofthe human brain on neurocomputers. Although extensive re-search has been done to accelerate computational functionssuch as matrix operations on general-purpose computers, theparallelism of the human brain has remained elusive to von-Neumann architecture, thus engendering high hardware costand energy consumption [1]. This has resulted in the ex-ploration of non-von Neumann architectures with ‘massivelyparallel operations in-memory’, thus avoiding the overhead

I Chakraborty, D Roy and K Roy are with the Department of Electrical andComputer Engineering, Purdue University, West Lafayette, IN, 47906 USAe-mail: [email protected];[email protected];[email protected]

cost of exchanging data between memory and processor. Espe-cially with the recent advances in machine learning in variouscognitive tasks such as image recognition, natural languageprocessing etc, the search for such energy-efficient ‘in-memorycomputing’ platforms has become quintessential. Althoughstandardized hardware implementations of neuromorphic sys-tems like CAV IAR [2], IBM TrueNorth [3], SpiNNaker[4] have primarily been dominated by CMOS technology,the memristor-based non-volatile memory (NVM) technology[5]–[9] has naturally evolved into an exciting prospect. Tothat end, various technologies such as spintronics [10], oxide-based memristors [11], [12], phase change materials (PCM)[13], etc., have shown promising progress in mimicking thefunctionality of the core computational units of a neuralnetwork, i.e., neurons and synapses.

The core functionality of a neuromorphic system is aparallelized dot product between the inputs and the synapticweights [14]. This has been demonstrated to be efficientlyrealized by a dense resistive crossbar array [15], [16]. Theability to naturally compute matrix multiplications makescrossbar arrays the most convenient way of implementing neu-romorphic systems. However, real crossbars could suffer fromvarious non-idealities including device variations [17], [18],parasitic resistances, non-ideal sources, and neuron resistances.Although neural networks are generally robust against smallvariations in the crossbar, the aforementioned technologicalconstraints can severely impact accuracy of recognition tasksas well as restrict the crossbar size. Several techniques suchas redundancy schemes [19], technology optimization [20] andmodified training algorithms [21]–[23] have been explored forboth on-chip and ex-situ learning to mitigate specific non-idealeffects such as IR drops, synaptic device variations. However,mathematical modeling of non-idealities and its incorporationin standard training algorithm needs further exploration.

In this work, we analyze the impact of non-idealitiessuch as source resistance, neuron resistances, and synap-tic weight variations in hardware implementations of neu-romorphic crossbars. We show how such non-idealities cansignificantly degrade the accuracy when traditional trainingmethodologies are employed. The presence of these parasiticelements also severely limits the crossbar sizes. As a solution,we propose an ex-situ technology aware training algorithmthat mathematically models the aforementioned non-idealitiesand accounts for the same in the traditional backpropagationalgorithm. Such a technique not only preserves the accuracy ofan ideal network appreciably but also allows us to use larger

arX

iv:1

711.

0888

9v1

[cs

.ET

] 2

4 N

ov 2

017

Page 2: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

2

Input layerM Neurons Hidden layer

N Neurons

Output layerP Neurons

SynapticConnection

SynapticConnections

(ai1)

(zj1) (ai

2)

(aiL)

σ

Input Image

ConvolutionalLayer

ConvolutionalLayer

PoolingPooling

FullyConnected

LayerOutputLayer

ActivationActivation

a) b)(wji1) (wji2)N×M P×N

(zj2) σ

Fig. 1. (a) Fully connected 3-layered neural network showing the input layer, hidden layer, and an output layer. Each neuron in a particular layer is fed byweighted sum of all inputs of the previous layer and it performs a sigmoid operation on the sum to provide the inputs for the next layer. (b) CNN Architecturewith different convolutional and pooling layers terminated by a fully connected layer.

crossbar sizes without significant accuracy degradation. Thekey highlights of our work are as follows:

1) We mathematically model the effect of source resistance,neuron resistance, and variations in synaptic conduc-tance on the output currents of a neuromorphic crossbar.We establish the validity of our model by comparingagainst SPICE-like simulations of resistive networks.

2) We analyze the impact of these non-idealities on theaccuracy of two types of image recognition tasks withvarying amounts of non-ideality within relevant techno-logical limits.

3) We propose a training algorithm which incorporates themathematical models of the crossbar non-idealities andmodifies the standard training algorithm in an effort torestore the ideal accuracy.

II. CROSSBAR IMPLEMENTATION OF NEURAL NETWORKS

A. Types of network topologies

1) Fully Connected Networks: Traditionally, deep neuralnetworks such as deep belief nets (DBNs) comprise of mul-tiple layers of interconnected units. Fully connected networks(FCN) involve a series of neuron layers between the inputand the output layers. The output of each neuron in a layer isconnected to the inputs of all the neurons in the subsequentlayer. Fig. 1(a) shows a 3-layered fully connected networkconsisting of a single hidden layer between the input andoutput layers.

2) Convolutional Networks: Complex image recognitiondatasets comprise of objectively different classes where globalweight mapping like FCNs prove to be less efficient. Asan alternative, convolutional neural networks (CNN) havebeen recognized as a more powerful tool for complex imagerecognition problems using locally shared weights to learncommon spatially local features. As shown in Fig. 1(b), CNNsconsist of several layers performing operations like convolu-tion, activation, and pooling, finally terminating with a fullyconnected layer. During the convolution operation, each filterbank, or kernel, is slid across the input to that layer to obtaina dot product between the input and the weights, known as thefeature map. The number of output maps of each convolutionallayer denotes the number of different feature maps learned inthat layer. Thus a convolution operation captures the spatiallylocal features of an input image. Convolution of a m×m input

map with kernel of size n × n yields an output map of size((m−n+2p)/s+1)×((m−n+2p)/s+1), where s is the strideof the filter and p is the padding. In practice, s and p are chosensuch that the original input size is preserved. The activationlayer which can be RELU [24], sigmoid [25], or other non-linear functions, introduces a non-linearity in the network [26].The pooling layer reduces the dimensionality of the outputmap. Most commonly used pooling techniques are averageand max-pooling [27]. Finally, the fully connected layer usesthe learned features to classify the images. In essence, a fullyconnected layer could also be represented by a convolutionallayer where the kernel size is equal to the input size.

B. Hardware representations of Neural networks

In hardware realizations of neural networks, the synapticconnections between the neurons of two adjacent layers arerepresented using a resistive crossbar. The weights are repre-sented in terms of conductance and the inputs are encodedas voltages. Convolutional layers have locally concentratedconnections, hence each filter bank is represented by a crossbarof equivalent size. The input to the crossbar is a subset of theimage being sampled by the kernel. Each element of the outputmap is calculated through time multiplexing of the outputsfrom a particular crossbar for different subsets of the image.This is repeated for each filter bank to obtain different outputmaps. In contrast, fully connected layers have all possibleconnections between input and the output and the entireconnection matrix can be represented by a crossbar. The basiccomputational function of any layer is a dot product and can beseamlessly performed by representing the weights as the resis-tances in a crossbar fashion. The output current of jth neuronof each crossbar is computed as Ij =

∑V +i G

+ji + V −

i G−ji,

where Vi is the input voltage corresponding to ith inputneuron and Gji represents the conductance corresponding tothe synaptic weights between the neurons. Two resistive arraysare deployed to account for bipolar weights. The input to thepositive array is +Vi whereas the input to the negative arrayis −Vi. The weight matrix [wji] is mapped to a correspondingconductance range (Glow, Ghigh) ⊂ (Gon, Goff ). To representbipolar weights, the conductance of the synapse connecting thejth neuron in the next layer to the ith input is denoted by apositive (G+

ji) component and a negative (G−ji) component. For

positive (negative) weights, the programming is done such that

Page 3: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

3

TABLE IRESISTANCE RANGES FOR VARIOUS TECHNOLOGIES

Technology (Ron, Roff ) Considered Range (Rlow , Rhigh) Rs/Rhigh(%) Rneu/Rhigh(%)

TiO2 [28] 15k, 2M 40k,600k 0.033 - 0.13 0 - 0.033Ag/Si [18] 25k,10M 100k,1.5M 0.013 - 0.053 0 - 0.013TaOx [29] 1k,1M 20k,300k 0.067 - 0.27 0 - 0.067Spintronics* Function of MTJ oxide thickness [30] 40k, 400k 0.05 - 0.2 0-0.05PCM [13] 10k, 3M 60k, 900k 0.022 - 0.08 0 - 0.022

*The spintronic analysis is done based on predictive measures of Roff/Ron [31]Rs range - 200 to 800 Ω, Rneu range - 0 to 200 Ω

To neurons of next layer

V1+

IN

M×N Crossbar Network

G11+ G21

+ G31+ G41

+ GN1+

G11- G21

- G31- G41

-

G12+ G22

+ G32+ G42

+

GN1-

GN2+

GN2-

GNM+

GNM-

Positive array Negative array

I = V+G

++V

-G

-

G12- G22

- G32- G42

-

G1M+ G2M

+ G3M+

G4M+

G1M- G2M

- G3M- G4M

-

a)

V2+

VM+

V1-

V2-

VM-

I1 I2 I3 I4

Total Current

Fig. 2. (a) Hardware implementation of a single fully connected networklayer represented by two resistive crossbar arrays. The output of the crossbarwill be fed to another crossbar representing the next layer. (b) An arrangementof multiple sub-crossbars to realize the functionality of a large crossbar.

G+ji(G

−ji) = |wji|Ghigh and G−

ji(G+ji) = 0 (no connection).

Fig. 2(a) shows a crossbar implementation of a fully connectedneural network.

As mentioned earlier, crossbar arrays could suffer fromnon-ideal effects and incur limitations on their sizes. As aresult, larger crossbars are divided into smaller crossbars and

the output of each crossbar is time-multiplexed to obtain thedesired functionality of the entire crossbar. Fig. 2(b) showshow multiple small crossbars can be efficiently mapped torealize the functionality of a large crossbar in a particularlayer. The small size of the crossbar reduces fan-out andfan-in, thus minimizing the impact of non-idealities. FCNs,being densely connected, are severely affected by hardwareimperfections, especially when implemented on large cross-bars. Convolutional layers in CNNs are usually implementedon very small crossbars and are thus insensitive to non-idealeffects. However, the final fully connected layers which acts asa classifier can be significantly affected by these non-idealitiesdue to their large sizes. In this work, we are thus consideringthe impact of non-idealities on FCNs, and fully connectedlayers of CNNs.

C. Training

The training of Artificial Neural Networks (ANN) aretraditionally done off-chip through the standard backpropaga-tion algorithm which updates weight matrices using gradientdescent technique [32]. It is important to note down the vitalaspects of the algorithm here in relevance to the later sections.The basic algorithm updates weights based on the gradientsof a cost function. The cost function depends on the errorcomputed from the feed-forward network which assumes aform : C = 1

2

∑(yj − aj)2, where yj is the expected output

and aj is actual output from the jth neuron in the outputlayer. The sensitivity of the errors for each layer are calculatedfrom the derivatives of the cost function with respect to theoutputs and weights and after each iteration, the weightsare updated based on those of the corresponding layer. Thedetailed description of the algorithm is well documented [32].In this work, we focus on the aspects of the algorithm pertinentto fully connected layers and we build mathematical modelsto account for the non-idealities experienced by the hardwareimplementation of neuromorphic crossbars.

D. Technologies

Various technologies have been explored for crossbar im-plementations of neural networks. Memristive crossbars basedon different material systems (like TaOx [33], TiO2 [34],Ag/Si [35] etc) have been proposed to realize neuromorphicfunctionality in an energy efficient manner. Phase changematerials (PCM) [13] have also been investigated as potentialcandidates for neuromorphic computing due to their highscalability. More recently, neurons and synapses implemented

Page 4: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

4

with spintronic devices [10], [16] have shown great promisein performing ultra-low power neuromorphic computing. How-ever, each technology suffers from specific drawbacks. An im-portant metric in regard of resistive crossbars for neuromorphicsystems is the the ratio of the high resistance state (Roff )and the low resistance state (Ron) of the synaptic device.Usually, a high Roff/Ron ratio is desired for a near-idealimplementation of the weights in a neuromorphic crossbar.Moreover, in the light of non-ideal systems, higher valuesof Ron, Roff may be less significantly impacted by parasiticresistances. In this work, we have chosen a maximum tominimum conductance (Glow = α/Roff , Ghigh = 15Glow,α is a parameter of choice) ratio of 15 which is a potentiallyrealizable predictive measure for all memory technologies[13], [31], [36].

III. MODELING THE NON-IDEALITIES

To neurons of next layer

V1+

V1-

V2+

V2-

VM+

I1 I2 I3 I4 IN

M×N Crossbar Network

VM-

GN1+

GN1-

GN2+

GN2-

GNM+

GNM-

Positive array Negative array

VS1 VS2 VS3 VS4 VSN

VSj = IjRneu

G11-

G21-

G31- G41

-

Rs Rneu

G11+ G21

+ G31+ G41

+

G12+ G22

+ G32+ G42

+

Vi+

deg(G+

+ ∆) + Vi-deg(G

- + ∆)

1+ Rneu(G++ ∆ + G

- + ∆)

I =

G12- G22

- G32- G42

-

V2+

deg

V1+

deg

V1+

deg

V2+

deg

VM+

deg

VM-deg

G1M+ G2M

+ G3M+ G4M

+

G1M- G2M

- G3M- G4M

-

Vi+/-

deg = βi+/-Vi

+/-

Fig. 3. Crossbar Architecture showing non-ideal elements like source andneuron resistances. The final output current equation is modified by the impactof these non-ideal elements.

Memristor based neuromorphic crossbar designs leveragesits inherent capability of matrix multiplication to providehigh accuracy at a relatively modest computational cost [37].However, the memristor technology is still in its nascentstage. Thus, the hardware implementation of such crossbarsmay suffer various kinds of non-ideal effects arising frommemristor device variations, parasitic resistances as well asnon-idealities in sources, and sensing neurons. In this work,we have considered three kinds of non-idealities that arise incrossbar implementations, namely,

1) Neuron Resistance (Rneu)2) Source Resistance (Rs)3) Memristive resistance variations

To perform an analysis of the impact non-idealities mighthave on accuracy of recognition task, it is important to note

the ratio of non-ideal resistances to the synaptic resistancesfor a particular technology. Table. I shows the range ofconsidered resistance ratios to synaptic resistances Rs/Rhighand Rneu/Rhigh for various technologies, considering relevantvalues of source (Rs) and neuron (Rneu) resistances.

A. Neuron Resistance

The resistance offered by the neuron in a neuromorphiccrossbar varies from technology to technology. In many cases,such as, PCM technology, the resistance of the neuron is nota hardware issue as the crossbar outputs are sensed through asense amplifier, where virtual ground at the input eliminatesthe voltage drop across the neuron. However, in spintroniccrossbars [10], crossbar outputs are fed to the neuron as acurrent stimulus and thus, the resistance of the neuronal devicebecomes relevant. Fig. 3 shows the effect of neuron resistanceon the crossbar output. This can be mathematically modeledto modify Eqn (1) as:

Ij =

∑V +i G

+ji + V −

i G−ji

1 +Rneu∑G+ji +G−

ji

(1)

Here, Ij , V+/−i , G+/−

ji and Rneu carry the same meaningas described in earlier sections. Eqn (1) can be derived byapplying Kirchoff’s law at the output nodes of the crossbarand considering the voltage drop across the neuron to beVj,neu = Ij×Rneu. It is evident that the denominator is closeto 1 for smaller arrays as Gjis are much smaller than neuronconductances (resistances of the order of a few hundred ohms[10]). However, larger arrays could lead to Gneu = 1/Rneubeing comparable to sum of the conductances in a particularcolumn. More specifically, a higher number of rows in thecrossbar lead to enhanced impact of neuron resistance.

B. Source Resistance

The source resistance (Rs) in a neuromorphic crossbarcould arise due to non-ideal voltage sources and input accessselectors lumped together. The input voltages to crossbar getsdegraded due to Rs and the degradation can be mathematicallymodeled as:

V +i,deg = V +

i

1/Rs

1/Rs +∑

1R+

ij+Rneu

(2)

V −i,deg = V −

i

1/Rs

1/Rs +∑

1R−ij+Rneu

(3)

Here R+/−ij is the resistance of the synaptic element between

the ith row and jth column in the positive or negative array.The model ignores the effect of sneak paths. In neuromorphiccrossbars, all the inputs are simultaneously active. As the IRdrops in the metal lines are negligible, all the nodes in aparticular row are supplied by the degraded source voltageof that row. As all the rows are supplied by voltages of samepolarity, even the shortest possible current sneak path will ex-perience a low potential difference. Thus, the current throughthe series connection of the synaptic memristor and neuronwould be primarily dependent on the degraded supply voltage

Page 5: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

5

and effective series resistance. We have verified the validityof the model by comparing against SPICE-like simulations,which is described in more detail in Section IV B.

C. Memristive Conductance Variations

The weights obtained from the training algorithm are usu-ally discretized in order to be represented as memristivesynapses. In this work, we have used a 4-bit discretizationtechnique where we have used a Rhigh/Rlow ratio of 15,relevant to the technologies considered. We have mapped theweights such that the maximum weight always maintains theRhigh/Rlow ratio to the minimum weight. We have chosenthe maximum and minimum weight limits so as to minimizethe accuracy degradation due to discretization. To analyzethe impact of chip-to-chip variation of weights, we haveintroduced weight variations in terms of standard deviation(σ) errors, ranging from -2σ to +2σ after discretization. Thisimplies that all the memristive devices on a neuromorphic chipsuffer the same variation at a particular process corner. Theweight variations are incorporated in the mathematical modelas a ∆ variation to the conductances.

D. Proposed Training Algorithm

The mathematical representations of the non-idealities arefinally collated and incorporated in the feed-forward pathand the backpropagation algorithm for training the ANN.Weights wji and inputs ai replaces the conductances Gjiand voltages Vi respectively in Eqn (1) and Eqn (2). Thesymbol zj is used to represent the current output of thecrossbars Ij corresponding to jth neuron of the next layer.We assume that the neuronal function receives a current inputand provides a voltage output. For the sake of simplicity,we assume ideal mathematical representations of activationfunctions like RELU [24] and sigmoid [25]. As described inSection. II A, the ideal crossbar output of the jth column inany layer is given by zj =

∑i

ai × wji. The modified crossbar

output can be computed as follows:

zlj =

∑a+i,degw

+ji,vary + a−i,degw

−ji,vary

γj(4)

γj = 1 +Rneu∑i

w+ji,vary + w−

ji,vary

where, a+i,deg = ai1/Rs

β+i

a−i,deg = −ai1/Rs

β−i

w+/−ij,vary = w

+/−ij + ∆

β+/−i = 1/Rs +

∑ 1

R+/−ij +Rneu

R+/−ij = 1/w

+/−ij,vary

As described earlier, two weight matrices are deployed toaccount for bipolar weights in the original weight matrixW = [wji]. Positive (Negative) inputs are fed to the positive(negative) weight array. The weight matrices are created such

that w+ji(w

−ji) = 0 for all i,j for which Wji < 0(> 0)

and w+ji(w

−ji) = Wji for all i,j for which Wji > 0(< 0).

Note that mapping the weights to a particular conductancerange is equivalent to multiplication by a scaling factor as wehave already discretized the weights based on a maximum tominimum weight ratio equal to Ghigh/Glow = 15. Thus anequivalent representation in terms of conductance would beG

+/−ji = WjiGhigh.The output of each crossbar is passed as inputs to the next

crossbar through a sigmoid function such that aL+1i = σ(zLi )

(where L is the layer index). The backpropagation algorithm ismodified to account for the modified crossbar functionality. Asdescribed earlier, learning in neural networks relies on com-putation of gradients of a cost function. Here, it is calculatedfrom the error between the expected and the actual output ofthe output layer neurons in the form of C = 1

2

∑(yj − aLj )2.

The delta-rule in the backpropagation algorithm [38] involvescalculation of δ for each layer accounting for the change inthe cost function for unit change in inputs to that particularlayer. Thus, δ for layer l can be written as:

For output layer,

δLj =∂C

∂zLj=

∑ ∂C

∂aLj

∂aLj∂zLj

= (aLj − yj)σ′(aLj ) (5)

For other layers,

δlj =∂C

∂zlj=

∑k

∂C

∂zl+1k

∂zl+1k

∂zlj=

∑k

δl+1k

∂zl+1k

∂zlj(6)

∂zl+1k

∂zlj=∂zl+1k

∂alj

∂alj∂zlj

=∂zl+1k

∂aljσ′(alj)

∂zl+1k

∂alj=

a+,lj,deg

aljw+jk,vary −

a−,lj,deg

aljw−jk,vary

γj(7)

Finally, the δs of each layer are used to compute the weightupdates as:

dwljk =∂C

∂wljk=∂C

∂zlj

∂zlj∂wljk

= δlj∂zlj∂wljk

(8)

∂zlj∂wljk

=γj(a

+k,deg(1−

w+kj

β+k

) + a−k,deg(1−w−kj

β−k))−Rneuzljγj

γ2j(9)

To simulate the impact of non-idealities on varying crossbarsize, we divide the large crossbars of size M ×N into severalsmaller crossbars of size m× n. Fig. 1(c) shows the networkarchitecture of combining smaller crossbars to realize theneuromorphic functionality of larger crossbars. The sourcedegradation factor βi is more prominent for larger number ofcolumns as it depends on the term

∑j

1/(Rij +Rneu) summed

over the columns. The neuron resistance degradation factor γj ,on the other hand, increases with the number of rows due to itsdependence on the term

∑i

wji, summed over the rows. Thus,

the combined effect of these two non-idealities is expected tohave a higher impact on the network for larger crossbars.

Page 6: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

6

TABLE IICNN ARCHITECTURE

Input 32 × 32 RGB image5 × 5 conv. 64 RELU

2 × 2 max-pooling stride 25 × 5 conv. 128 RELU

2 × 2 max-pooling stride23 × 3 conv. 256 RELU

2 × 2 avg-pooling stride 24 × 4 conv. 512 Sigmoid (fully connected)

0.5 Dropout1 × 1 conv. 10 (fully connected)

10-way softmax

IV. SIMULATION FRAMEWORK

A. Model simulations

The model described in the previous section was im-plemented on FCNs using the MATLAB R© Deep LearningToolbox [39] and CNNs using MatConvNet [40].

1) FCN: A 3-layered neural network was employed torecognize digits from the MNIST Dataset. The training setconsists of 60000 images, while the testing set consists of10000 images. The input layer consists of 784 neurons des-ignated to carry the information of each pixel of each 28×28image. The hidden layer consists of 500 neurons and theoutput layer has 10 neurons to recognize 10 digits. The neurontransfer function was chosen to be the sigmoid function whichcan be written as σ(x) = 1

1+e−x .2) CNN: For the classification of more complex dataset

CIFAR-10, we have used a network with RELU-activated con-volutional layers and a sigmoid-activated fully connected layer.The architecture is represented as 32×32×3-64c5-2s-128c5-2s-256c3-2s-512o-10o.The details of the layers are providedin Table. II. Each convolutional layer is followed by a batch-normalization layer for better performance. We concentrateour analysis on the fully connected layers of the networkas the initial convolutional layers possess local connectionsimplemented on small crossbars equal to the kernel sizes.

B. SPICE-like Simulations for validation

Each fully connected layer for both FCNs and CNNs canbe implemented in a crossbar architecture comprising of allpossible connections. A SPICE-like framework was imple-mented in MATLAB R© by creating a netlist of all connections,voltage source, source and neuron resistances in such resistivecrossbars and evaluating the voltages at each node by solvingthe conductance matrix: [V ] = [G]−1[I]. The framework wasbenchmarked with HSPICE R©. This framework was used tocalculate the output of non-ideal crossbars on application ofthe inputs from the MNIST dataset as voltages. The resistancesof the crossbar elements Rji were determined such thatRji = 1/wji, where wji are the weights determined by theideal training scheme described in the previous subsection.The output obtained by showing 100 images of the testingset was averaged and the distribution was compared withthe mathematical model simulations. Fig. 4(a) shows thecomparison in the distribution of output currents of a crossbarwhere the approximate model shows good agreement with

the exact SPICE-like simulations. Fig. 4(b) shows that thenormalized root mean square deviation (NRMSD) between thetwo techniques for various (Rs +Rneu)/Rhigh combinationsremains very close to zero for relevant values. It is to be notedthat the validation of our approximate model was importantin the context of reducing the training time as the matrixoperations could be more efficiently performed using themathematical model than simulating the network for eachinput image in HSPICE R©.

×

Fig. 4. (a) Distribution of output currents (Imean), averaged over 100 images,across 500 neurons in the hidden layer comparing the approximate model toSPICE-like simulation framework. (b) Variation of Normalized Root MeanSquare Deviation (NRMSD) with non-ideality ratio. NRMSD is close to zerofor the relevant range of non-idealities.

V. RESULTS AND DISCUSSION

We analyzed the impact of technological constraints incrossbar implementations on both FCNs and CNNs. As fullyconnected layers form the crux of classification in both net-work topologies, it is expected that such non-ideal conditionswill have similar detrimental effects on both. We present thedetailed impact of each non-ideality on FCNs and CNNs forbetter understanding.

We consider a 3-layered FCN and a CNN architecturedescribed in Table. II to analyze the impact of the non-idealities on the accuracy of recognition task on MNIST andCIFAR-10 datasets respectively. The other convolutional layers

Page 7: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

7

× ×

× ×

Fig. 5. Accuracy degradation v/s varying Rneu/Rhigh ratio for differentRs/Rhigh combinations comparing technology aware training scheme withnormal training for (a) FCN and (b) CNN.

in the CNN are usually implemented using small crossbars andhence do not suffer significant effects of non-ideal resistances.

First, the neural networks were trained under ideal condi-tions using the training set. Then, the non-ideal model was in-cluded in the feed-forward path and the ideally trained networkwas tested using the testing set to determine the performancedegradation due to the non-idealities. Next, the technologyaware training algorithm was implemented by incorporatingthe mathematical formulation of the non-idealities in the stan-dard training iterations of feed-forward and backpropagation asdescribed in the Section III D. For each iteration, the weightswere discretized as described in Section III C. The testingaccuracy of an ideally trained FCN with a sigmoid neuronalfunction was 98.12% on MNIST and that of an ideallytrained CNN was 85.6% on CIFAR-10 datasets. The accuracydegradations discussed in this section has been calculated withrespect to these ideal testing accuracies such that AccuracyDegradation (%) = Ideal Accuracy (%) - Accuracy Obtained(%). We use the parameters Rs/Rhigh and Rneu/Rhigh to

CNN

Fig. 6. Accuracy degradation v/s σ variations in weights for variousRs/Rhigh and Rneu/Rhigh combinations comparing the technology awaretraining scheme with normal training for (a) FCN and (b) CNN.

denote the ratios of the non-ideal resistances and the maximumsynaptic resistance.

1) Source and Neuron Resistance: Fig. 5(a) and 5(b)shows the accuracy degradation for different Rneu/Rhigh andRs/Rhigh combinations in FCN and CNN, respectively. Theeffect of the non-ideal resistances on the performance ofthe network predictably worsens monotonically with higherRs/Rhigh and Rneu/Rhigh ratios. It can be observed thatwith normal training methods, the non-ideal resistances re-sult in accuracy degradation for FCN: up to 41.58% forRneu/Rhigh = 0.07% and Rs/Rhigh = 0.27%. Our proposedtraining scheme incorporates the impact of non-idealities andachieves significant restoration of accuracy, within 1.9% of theideal accuracy, for the worst case combination of resistancesconsidered, shown in Fig. 5(a).

In case of CNNs, we show that due to the large crossbarsizes of the fully connected layers in the CNN, it can sufferup to 59.3% degradation in accuracy for the worst case non-ideal resistances considered. Our proposed algorithm, on the

Page 8: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

8

other hand, achieves an accuracy within 1.5% of the idealaccuracy (Fig. 5(b)), considering the largest crossbar sizes forthe architecture.

2) Weight variations: On-chip crossbar implementationssuffer from chip-to-chip device variations. To account for suchvariations, we form a defect weight matrix, and include it inthe feed-forward network, as described in detail in Section IIIC. We have considered up to ±2σ variation in the synapticweights. Fig. 6 shows the impact of such device variationson the accuracy of FCN and CNN for different combinationsof Rs/Rhigh and Rneu/Rhigh. Predictably, changes in thepositive direction reduces the accuracy degradation from thenominal (no variation) case as it enhances the significanceof the neurons. However, changes in the negative directionslightly degrades the accuracy from the nominal case. It isobserved that a −2σ variation can result in an accuracydegradation of up to 59.9% for Rneu/Rhigh = 0.067%and Rs/Rhigh = 0.27% in FCN. By accounting for thesevariations in the backpropagation algorithm, our proposedtraining methodology successfully restores the accuracy within2.34% of the ideal accuracy for worst case of non-idealitiesconsidered, as shown in Fig. 6(a).

Weight variations in the negative direction also adverselyaffect CNNs where −2σ variation can result in an accuracydegradation of 62.4% considering the non-ideal resistancesmentioned above. Our proposed algorithm achieves an accu-racy within 0.8% of the ideal testing accuracy as shown inFig. 6(b).

3) Crossbar Size: Non-idealities in crossbars usually es-tablish restrictions on the allowable crossbar sizes due tothe dependence of their performance on fan-in and fan-out.For example, the impact of Rs on the crossbar depends onthe parallel combination of column resistances and a highernumber of columns (and hence, higher fan-out) result insevere performance degradation. Also, the impact of Rneuintensifies with increasing number of rows in the crossbaras it leads to more fan-in. As observed in Fig. 7(a), thecombined effect of these resistances and variations can resultin significant accuracy degradation (41.58%) when the net-work is implemented on crossbars of sizes 784 × 500 and500 × 10 for the respective layers in the FCN. Under thesame non-ideal conditions, accuracy degradation drops to 1.2%when smaller crossbars of sizes 112 × 100 and 100 × 10are used to represent the functionality of the network. Incontrast, considering the same Rs and Rneu, our proposedtraining algorithm achieves an accuracy degradation within∼ 1.89% for sizes 784× 500, 500× 10 and ∼ 0.3% for sizes112 × 100, 100 × 10. Thus, the proposed algorithm ensuresthat a network implemented on larger crossbars can parallelthe performance of ideally trained networks implemented onsmaller crossbars with minimal degradation.

The convolutional layers in CNNs are implemented onsmaller crossbars. For the fully connected layers in the CNNarchitecture, we have considered significantly larger crossbarsof sizes 4096 × 512 and 512 × 10. Due to large sizes of thelast 2 layers of the considered architecture, we show in Fig.7(b), that the network, when trained under ideal conditions,can suffer as large as 59.3% degradation in accuracy for the

a)

FCN

b)

CNN

Fig. 7. Accuracy degradation v/s crossbar size for various Rs/Rhigh andRneu/Rhigh combinations comparing the technology aware training schemewith normal training for (a) FCN and (b) CNN. Larger crossbars show higheraccuracy degradation.

worst case resistance constraints considered. On the otherhand, using smaller crossbars of sizes 512 × 64, 64 × 10reduces the accuracy degradation to 2.4% for the same con-ditions. In comparison, a network trained with the proposedtechnology aware training algorithm restores the accuracy towithin ∼ 1.5% of the ideal accuracy even for the highestcrossbar sizes (4096 × 512, 512 × 10). Thus, the proposedalgorithm ensures that a CNN with fully connected layersimplemented on crossbars of size in the order of 4096× 512can achieve better performance than for crossbars of size512×64 with standard training algorithms. Such a provision ofusing large crossbars for implementing neuromorphic systemscould potentially reduce overheads of repeating inputs, timemultiplexing outputs, thus ensuring faster operations.

VI. CONCLUSION

Hardware implementations of neuromorphic systems incrossbar architecture could suffer from various non-idealitiesresulting in severe performance degradation when employed

Page 9: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

9

in machine learning applications such as recognition tasks,natural language processing, etc. In this work, we analyzed, bymeans of mathematical modeling, the impact of non-idealitiessuch as source resistance, neuron resistance and chip-to-chipdevice variations on performance of a 3-layered FCN onMNIST and a state-of-the-art CNN architecture on CIFAR-10. Severe degradation in recognition accuracy, up to 59.84%,was observed in FCNs. Although convolution layers in CNNcan be implemented on smaller crossbars, the large fullyconnected layers at the end made them prone to performancedegradation (up to 62.4% for our example). As a solution,we proposed a technology aware training algorithm whichincorporates the mathematical models of the non-idealities inthe training algorithm. Considering relevant ranges of non-idealities, our proposed methodology recovered the perfor-mance of the network implemented on non-ideal crossbars towithin 2.34% of the ideal accuracy for FCNs and 1.5% forCNNs. We further show that the proposed technology awaretraining algorithm enables the use of larger crossbars of sizesin the order of 4096 × 512 for CNNs and 784 × 500 forFCNs without significant performance degradation. Thus, webelieve that the proposed work potentially paves the way forimplementation of neuromorphic systems on large crossbarswhich otherwise is rendered unfeasible using standard trainingalgorithms.

ACKNOWLEDGMENT

The research was funded in part by the National ScienceFoundation, Center for Spintronics funded by DARPA andSRC, Intel Corporation, ONR MURI program, and VannevarBush Faculty Fellowship.

REFERENCES

[1] W. A. Wulf and S. A. McKee, “Hitting the memory wall,” ACMSIGARCH Computer Architecture News, vol. 23, no. 1, pp. 20–24, mar1995. [Online]. Available: https://doi.org/10.1145%2F216585.216588

[2] R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco,R. Paz-Vicente, F. Gomez-Rodriguez, L. Camunas-Mesa, R. Berner,M. Rivas-Perez, T. Delbruck, S.-C. Liu, R. Douglas, P. Hafliger,G. Jimenez-Moreno, A. Ballcels, T. Serrano-Gotarredona, A. Acosta-Jimenez, and B. Linares-Barranco, “CAVIAR: A 45k neuron, 5msynapse, 12g connects/s aer hardware sensory-processing-learning-actuating system for high-speed visual object recognition and tracking,”IEEE Transactions on Neural Networks, vol. 20, no. 9, pp. 1417–1438,sep 2009. [Online]. Available: https://doi.org/10.1109%2Ftnn.2009.2023653

[3] P. Merolla, J. Arthur, F. Akopyan, N. Imam, R. Manohar, and D. S.Modha, “A digital neurosynaptic core using embedded crossbar memorywith 45pj per spike in 45nm,” in 2011 IEEE Custom IntegratedCircuits Conference (CICC). IEEE, sep 2011. [Online]. Available:https://doi.org/10.1109%2Fcicc.2011.6055294

[4] X. Jin, M. Lujan, L. A. Plana, S. Davies, S. Temple, and S. B. Furber,“Modeling spiking neural networks on SpiNNaker,” Computing inScience & Engineering, vol. 12, no. 5, pp. 91–97, sep 2010. [Online].Available: https://doi.org/10.1109%2Fmcse.2010.112

[5] L. Chua, “Memristor-the missing circuit element,” IEEE Transactionson circuit theory, vol. 18, no. 5, pp. 507–519, 1971.

[6] H. Shiga, D. Takashima, S.-i. Shiratake, K. Hoya, T. Miyakawa, R. Ogi-wara, R. Fukuda, R. Takizawa, K. Hatsuda, F. Matsuoka et al., “A 1.6gb/s ddr2 128 mb chain feram with scalable octal bitline and sensingschemes,” IEEE Journal of Solid-State Circuits, vol. 45, no. 1, pp. 142–152, 2010.

[7] K. Osada, T. Kawahara, R. Takemura, N. Kitai, N. Takaura, N. Mat-suzaki, K. Kurotsuchi, H. Moriya, and M. Moniwa, “Phase change ramoperated with 1.5-v cmos as low cost embedded memory,” in CustomIntegrated Circuits Conference, 2005. Proceedings of the IEEE 2005.IEEE, 2005, pp. 431–434.

[8] S.-S. Sheu, M.-F. Chang, K.-F. Lin, C.-W. Wu, Y.-S. Chen, P.-F. Chiu,C.-C. Kuo, Y.-S. Yang, P.-C. Chiang, W.-P. Lin et al., “A 4mb embeddedslc resistive-ram macro with 7.2 ns read-write random-access time and160ns mlc-access capability,” in Solid-State Circuits Conference Digestof Technical Papers (ISSCC), 2011 IEEE International. IEEE, 2011,pp. 200–202.

[9] S. Chung, K.-M. Rho, S.-D. Kim, H.-J. Suh, D.-J. Kim, H.-J. Kim, S.-H.Lee, J.-H. Park, H.-M. Hwang, S.-M. Hwang et al., “Fully integrated54nm stt-ram with the smallest bit cell dimension for high densitymemory application,” in Electron Devices Meeting (IEDM), 2010 IEEEInternational. IEEE, 2010, pp. 12–7.

[10] A. Sengupta, Y. Shim, and K. Roy, “Proposal for an all-spin artificialneural network: Emulating neural and synaptic functionalities throughdomain wall motion in ferromagnets,” IEEE Transactions on BiomedicalCircuits and Systems, vol. 10, no. 6, pp. 1152–1160, dec 2016. [Online].Available: https://doi.org/10.1109%2Ftbcas.2016.2525823

[11] M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K.Likharev, and D. B. Strukov, “Training and operation of an integratedneuromorphic network based on metal-oxide memristors,” Nature,vol. 521, no. 7550, pp. 61–64, may 2015. [Online]. Available:https://doi.org/10.1038%2Fnature14441

[12] C. Liu, Q. Yang, B. Yan, J. Yang, X. Du, W. Zhu, H. Jiang, Q. Wu,M. Barnell, and H. Li, “A memristor crossbar based computing engineoptimized for high speed and accuracy,” in 2016 IEEE ComputerSociety Annual Symposium on VLSI (ISVLSI). IEEE, jul 2016.[Online]. Available: https://doi.org/10.1109%2Fisvlsi.2016.46

[13] S. B. Eryilmaz, D. Kuzum, R. Jeyasingh, S. Kim, M. BrightSky,C. Lam, and H.-S. P. Wong, “Brain-like associative learningusing a nanoscale non-volatile phase change synaptic device array,”Frontiers in Neuroscience, vol. 8, jul 2014. [Online]. Available:https://doi.org/10.3389%2Ffnins.2014.00205

[14] J. Schmidhuber, “Deep learning in neural networks: An overview,”Neural Networks, vol. 61, pp. 85–117, jan 2015. [Online]. Available:https://doi.org/10.1016%2Fj.neunet.2014.09.003

[15] P. Gu, B. Li, T. Tang, S. Yu, Y. Cao, Y. Wang, and H. Yang,“Technological exploration of RRAM crossbar array for matrix-vector multiplication,” in The 20th Asia and South Pacific DesignAutomation Conference. IEEE, jan 2015. [Online]. Available:https://doi.org/10.1109%2Faspdac.2015.7058989

[16] A. Sengupta and K. Roy, “A vision for all-spin neural networks:A device to system perspective,” IEEE Transactions on Circuits andSystems I: Regular Papers, vol. 63, no. 12, pp. 2267–2277, dec 2016.[Online]. Available: https://doi.org/10.1109%2Ftcsi.2016.2615312

[17] S. Yu, X. Guan, and H.-S. P. Wong, “On the stochastic natureof resistive switching in metal oxide RRAM: Physical modeling,monte carlo simulation, and experimental characterization,” in 2011International Electron Devices Meeting. IEEE, dec 2011. [Online].Available: https://doi.org/10.1109%2Fiedm.2011.6131572

[18] K.-H. Kim, S. Gaba, D. Wheeler, J. M. Cruz-Albrecht, T. Hussain,N. Srinivasa, and W. Lu, “A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications,”Nano Letters, vol. 12, no. 1, pp. 389–395, jan 2012. [Online]. Available:https://doi.org/10.1021%2Fnl203687n

[19] S. Kannan, N. Karimi, R. Karri, and O. Sinanoglu, “Detection,diagnosis, and repair of faults in memristor-based memories,” in 2014IEEE 32nd VLSI Test Symposium (VTS). IEEE, apr 2014. [Online].Available: https://doi.org/10.1109%2Fvts.2014.6818762

[20] P.-Y. Chen, D. Kadetotad, Z. Xu, A. Mohanty, B. Lin, J. Ye,S. Vrudhula, J. sun Seo, Y. Cao, and S. Yu, “Technology-designco-optimization of resistive cross-point array for accelerating learningalgorithms on chip,” in Design, Automation & Test in Europe Conference& Exhibition (DATE), 2015. IEEE Conference Publications, 2015.[Online]. Available: https://doi.org/10.7873%2Fdate.2015.0620

[21] B. Liu, H. Li, Y. Chen, X. Li, T. Huang, Q. Wu, and M. Barnell,“Reduction and IR-drop compensations techniques for reliableneuromorphic computing systems,” in 2014 IEEE/ACM InternationalConference on Computer-Aided Design (ICCAD). IEEE, nov 2014.[Online]. Available: https://doi.org/10.1109%2Ficcad.2014.7001330

[22] P.-Y. Chen, B. Lin, I.-T. Wang, T.-H. Hou, J. Ye, S. Vrudhula, J. sunSeo, Y. Cao, and S. Yu, “Mitigating effects of non-ideal synaptic devicecharacteristics for on-chip learning,” in 2015 IEEE/ACM International

Page 10: Technology Aware Training in Memristive Neuromorphic ... · Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars Indranil Chakraborty,

10

Conference on Computer-Aided Design (ICCAD). IEEE, nov 2015.[Online]. Available: https://doi.org/10.1109%2Ficcad.2015.7372570

[23] C. Liu, M. Hu, J. P. Strachan, and H. H. Li, “Rescuing memristor-based neuromorphic design with high defects,” in Proceedingsof the 54th Annual Design Automation Conference 2017 on- DAC 2017. ACM Press, 2017. [Online]. Available: https://doi.org/10.1145%2F3061639.3062310

[24] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltz-mann machines,” in Proceedings of the 27th international conference onmachine learning (ICML-10), 2010, pp. 807–814.

[25] J. J. Hopfield, “Neurons with graded response have collective compu-tational properties like those of two-state neurons,” Proceedings of thenational academy of sciences, vol. 81, no. 10, pp. 3088–3092, 1984.

[26] X. Glorot and Y. Bengio, “Understanding the difficulty of trainingdeep feedforward neural networks,” in Proceedings of the ThirteenthInternational Conference on Artificial Intelligence and Statistics, 2010,pp. 249–256.

[27] K. Jarrett, K. Kavukcuoglu, Y. LeCun et al., “What is the best multi-stage architecture for object recognition?” in Computer Vision, 2009IEEE 12th International Conference on. IEEE, 2009, pp. 2146–2153.

[28] R. Berdan, E. Vasilaki, A. Khiat, G. Indiveri, A. Serb, andT. Prodromakis, “Emulating short-term synaptic dynamics withmemristive devices,” Scientific Reports, vol. 6, no. 1, jan 2016.[Online]. Available: https://doi.org/10.1038%2Fsrep18639

[29] K. M. Kim, J. J. Yang, J. P. Strachan, E. M. Grafals, N. Ge, N. D.Melendez, Z. Li, and R. S. Williams, “Voltage divider effect forthe improvement of variability and endurance of TaOx memristor,”Scientific Reports, vol. 6, no. 1, feb 2016. [Online]. Available:https://doi.org/10.1038%2Fsrep20085

[30] X. Fong, S. K. Gupta, N. N. Mojumder, S. H. Choday, C. Augustine,and K. Roy, “KNACK: A hybrid spin-charge mixed-mode simulator forevaluating different genres of spin-transfer torque MRAM bit-cells,”in 2011 International Conference on Simulation of SemiconductorProcesses and Devices. IEEE, sep 2011. [Online]. Available:https://doi.org/10.1109%2Fsispad.2011.6035047

[31] A. Hirohata, H. Sukegawa, H. Yanagihara, I. Zutic, T. Seki,S. Mizukami, and R. Swaminathan, “Roadmap for emerging materialsfor spintronic device applications,” IEEE Transactions on Magnetics,vol. 51, no. 10, pp. 1–11, oct 2015. [Online]. Available: https://doi.org/10.1109%2Ftmag.2015.2457393

[32] D. RUMELHART, G. HINTON, and R. WILLIAMS, “Learninginternal representations by error propagation,” in Readings inCognitive Science. Elsevier, 1988, pp. 399–421. [Online]. Available:https://doi.org/10.1016%2Fb978-1-4832-1446-7.50035-2

[33] I.-T. Wang, Y.-C. Lin, Y.-F. Wang, C.-W. Hsu, and T.-H. Hou,“3d synaptic architecture with ultralow sub-10 fJ energy perspike for neuromorphic computation,” in 2014 IEEE InternationalElectron Devices Meeting. IEEE, dec 2014. [Online]. Available:https://doi.org/10.1109%2Fiedm.2014.7047127

[34] F. Alibart, E. Zamanidoost, and D. B. Strukov, “Pattern classificationby memristive crossbar circuits using ex situ and in situ training,”Nature Communications, vol. 4, jun 2013. [Online]. Available:https://doi.org/10.1038%2Fncomms3072

[35] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu,“Nanoscale memristor device as synapse in neuromorphic systems,”Nano Letters, vol. 10, no. 4, pp. 1297–1301, apr 2010. [Online].Available: https://doi.org/10.1021%2Fnl904092h

[36] J. Borghetti, G. S. Snider, P. J. Kuekes, J. J. Yang, D. R. Stewart, andR. S. Williams, “‘memristive’ switches enable ‘stateful’ logic operationsvia material implication,” Nature, vol. 464, no. 7290, pp. 873–876, apr2010. [Online]. Available: https://doi.org/10.1038%2Fnature08940

[37] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves,S. Lam, N. Ge, J. J. Yang, and R. S. Williams, “Dot-product enginefor neuromorphic computing: programming 1t1m crossbar to acceleratematrix-vector multiplication,” in Design Automation Conference (DAC),2016 53nd ACM/EDAC/IEEE. IEEE, 2016, pp. 1–6.

[38] R. Hecht-Nielsen et al., “Theory of the backpropagation neural network.”Neural Networks, vol. 1, no. Supplement-1, pp. 445–448, 1988.

[39] R. B. Palm, “Prediction as a candidate for learning deep hierarchicalmodels of data,” Technical University of Denmark, vol. 5, 2012.

[40] A. Vedaldi and K. Lenc, “MatConvNet,” in Proceedings of the 23rdACM international conference on Multimedia 2015. ACM Press, 2015.[Online]. Available: https://doi.org/10.1145%2F2733373.2807412