Applications of Memristors in ANNs - Electrical and...

Post on 25-Mar-2018

227 views 0 download

Transcript of Applications of Memristors in ANNs - Electrical and...

Applications of Memristors in ANNs

OutlineOutline

• Brief intro to ANNsBrief intro to ANNs

• Firing rate networksSi l l t i t– Single layer perceptron experiment

– Other (simulation) examples

• Spiking networks and STDP

ANNsANNs

ANN is bio‐inpsired massively parallel network,ANN is bio inpsired massively parallel network, i.e. directed graph, with nodes acting as neurons and edges acting as synapses. The functionality is learned during training phase by changing weights of synapses     

• By topology• By learning paradigm• By coding neural informationBy coding neural information

Very good reviewVery good review

Applications

ChallengesComplexity

~ 1011 neurons

~ 1015 synapses 10 synapses

Connectivity

~ 1 : 10000

i ll liMassive parallelism

100 steps long rule: few to several hundred hertz; face recognition i 00in ~100 ms

2‐3 mm think , 2200 cm2

McCulloch‐Pitts neuron

d ff f

1943

different activation functions

By topologyBy topology

By learning paradigm

l lKey questions: Capacity, Sample complexity, Computational complexity 

By information codingBy information coding

• Firing rate vs spikingFiring rate vs spiking models

Perceptron: Main idea

xBias, x0

Single layer perceptron

x1x2x3

w1 w0

]sgn[9

0

i

ii xwy

x9w9

Hebbian ruleHebbian rule

• Learning usingLearning using local information

• Orientation• Orientation selectivity

Multilayer perceptron

Key questions: number of layers, number of hidden neurons

BackpropagationBackpropagation

Gradient descent method to i i i t f timinimize cost function

Competitive learningCompetitive learning 

Learning binary patterns with kcompetitive network

Instar learning law:

What happens if more than four unique patterns are presented? q p p

What happens when all white pattern is presented?

Complementary codingComplementary coding

• Resolve no signal issue for a particular (instar) learning lawlearning law

• How to learn invariance?  (translation, size, angle etc.)

With added complex cellsWith added complex cells

With added complex cellsWith added complex cells

• AND in bottom layer OR in top present one hot• AND in bottom layer, OR in top, present one hot patterns to the top layer

Perceptron: Main idea

x1x

x4x

x7x x = +1x1

x

Bias, x0

w1

Single layer perceptron Binary pixel array

hw bottleneckx2x3

x5x6

x8x9

x = –1x2x3

w9

w1 w0

]sgn[9

0

i

ii xwy

x9w9

Considered training/test patterns

Pattern “X”, class d = +1Perceptron training rule: ∆wi = αxi(p)(d(p)‐y(p))

V

Crossbar implementation

V ∞ x G+-G- = G ∞ w

[I+ I ]

AI+

V0 V1 V9V2

G0+ G1

+ G2+ G9

+

Pattern “T”, class d = –1+ ‐

y = sgn[I+-I -]param. analyzer‐based

Alibart et al., submitted, 2012AI–G0– G1

– G2– G9

Windrow’s memistorAdaLiNe concept … … and hardware implementation

BernardWidrow

MarcianHoff

B. Widrow and M.E. Hoff, Jr., IRE WESCON Convention Record, 4:96 1960

Pt/TiO2‐x/Pt devicesg = I(0.2V)/ 0.2 V

25 nm Au / 15 nm Pt top electrode

1.0

)

=

Pt top electrode

5 nm Ti / 25 nm Pt  bottom electrode

e‐beam patterned Pt protrusion

30 nm TiO2‐xS

0

rent (m

A)

20 nm

‐ Any state betweenON and OFF

‐ In principle dynamic

‐1.0

Curr S

A

V

‐ In principle dynamic system with frequencydependent loop size   but ….

‐1.0 0 1.0Voltage (V)

A‐ Strongly (superexp)nonlinear switching dynamics

‐ Gray area = no changeVoltage (V)+Vswitch‐VswitchAlibart et al., 

submitted, 2012

Gray area   no change ‐ State defined within

gray area 

Switching dynamics

RESET: R =Rd

setvoltage initialize to R0FF

10

100

RESET: R0=RON

SET: R0=ROFF

reset

read

time initialize to R0N

1

10

R/R

0

‐ Small pulse amp = finer state change butmay require exp long time

‐ Large pulse amp faster but at cruder step

1E 8

0.11E-4

-0.9VmV

(A) -0.5V to -0.8V

1E-81E-6

1E-40.01

1

-1.5-1.0-0.5

0.00.5

1.01 5 Tim

e (s)

Pulse voltage (1E-5

-1.0V

-1.1V

-1.2V

-1.3V

Cur

rent

@ -2

00

1.5 Timge (V)

F. Alibart et al. Nanotechnology, 23 075201, 2012 

0 1x10-5 2x10-5

Time (s)

Nonlinear switching dynamics 

effective barrier modulation due to:

heating

electric field

1

2 ion hopping

e‐

ion hoping

z+z+e‐

electrodeelectrode 

UA

~Eaq/2

~ kB∆T

initial profile

2

1

eoxidation reduction‐+ v

Eaq/2

energy a∆UA

h t iti d ti3

3

2

hop distance

position

phase transition or redox reaction3

J. Yang et al. submitted 2012

Speed vs. retention

linear ionic transport linear ionic transport pp

TI

I

write

store ~)()0(

VV

DV

Vvv

nonnonlinearlinear effect due to temperature and/or electric field

)(~ writeB

A

storeB

A

store TkU

TkU

eeVV

e.g. temperature only:

Twrite V

D.Strukov et al. Appl.Phys.A 94 515 (2009)

Switching statistics

RESET SET

10-4

mV

(A)

10-4

0mV

(A)

10-5

urre

nt @

200

m

10-5

Cur

rent

@ 2

00

0.02.0x10-6

4.0x10-6

6.0x10-6

8 0x10-6

0.60.8

1.01.2

Cu

ve tim

e (s)

Voltag

0.0

5.0x10-7

1.0x10-6

1.5x10-6

-1.4-1.2

-1.0

ative

time (

s)

Voltage8.0x101.0x10-51.4

Cumula

tivetage (V)

5 0

2.0x10-6-0.8

-0.6 Cumula

tage (V)

10 TiO2‐x devices 

Alibart et al., submitted, 2012Large switching dynamics dispersion!

Variations in switching behavior

101.0

g = I(0.2V)/ 0.2 V

10

g INIT

IAL

‐1.0

0

Curren

t (mA)

1

gAF

TER/g

write‐1.0 0 1.0

Voltage (V)SET

10 1

Syn

S =readtune

RESET

-10

1

0.1

1 ulse voltage (V)

ynaptic weight

gINITIAL (mS

SET1

Pulsht,mS)

Alibart et al., submitted, 2012

RESET‐ Continuous state change

Tuning algorithmWrite

apply pulse VWRITE

Processing

VWRITE = VWRITE + sign * TVSTEPoldsign = sign 

Processing

Is state reached 

Start

(inputs: desired state Idesired, desired accuracy 

A

Read

Processing

check for overshoot and set the i f i t i

within required precision, i.e. (Idesired – Icurrent)/ Idesired < Adesired ?

Adesired; initialize: write voltage to small non‐disturbing value VWRITE  = 200 mV,  voltage step TVSTEP = 10 

V

(apply VREAD = 200 mV and read current Icurrent)

sign of increment, i.e. sign = Icurrent ‐ Idesired ;

if VWRITE !=VREAD  and sign !=oldsign then initialize VWRITE = 

200 mV

no

yes

Finish

mV;

Intuitive algorithm Implemented algorithmvoltage

0read

set timevoltage

0

set

time

Intuitive algorithm Implemented algorithm

resetread

resetread

non‐disturbing pulse F. Alibart et al. Nanotechnology, 23 075201, 2012 

High precision tuning

120AIncrease WeightDecrease Weightvoltage

set time

1E-4

60A

mV

(A)

Increase WeightStand-by (Read only)0

resetread TiO2‐x devices 

(w/o protrusion)

( )/

30A

t @-2

00

32

100(gdes‐gact)/gdes<1% ~ 8‐bit precision

1E-5Cur

rent 15A

29

30

31

0 1000 2000 3000

1E 57A

950 1000 1050 1100 115028

29

0 1000 2000 3000

Pulse Number F. Alibart et al. Nanotechnology, 23 075201, 2012 

Limitation to tuning accuracy: Random telegraph noise

3

5k 5k

g p

10-9

10-8

0 2 4 6 8 10

1

2

.u.)

4k

Hz-1

)

4k 2k 1k 0.5k

R/R

(%)

Resistance (k)

10-11

10-10

Cur

rent

(a

2k

1k

PS

D/I2 (H

Resistance (k)

102 103 104

10-12

10C

0.2 0.4 0.6 0.8 1.0 1.2

0.5k

P

0.2 0.4 0.6 0.8

Time (s)

Time (s) Frequency (Hz)

‐ Solid‐state electrolyte (electrochemical) are noisierThe higher R the larger is noise

Ligang Gao et al, VLSI‐SoC, 2012

‐ The higher R, the larger is noise‐ For a‐Si limit to ~5‐6‐bit precision (but no optimization)  

Perceptron experimental setup

Vt

Switching matrix( l )

Arbitrary waveform generator  B1530

A

(Agilent E5250A)

Current measurementB1530 (fast IV mode)

Ground (GNDU, Agilent)

Agilent B1500

Wires implementing crossbar circuit 

Agilent B1500

Chip packaged wire bonded memristive devices

Alibart et al., submitted, 2012

Perceptron: Ex‐situ trainings1

Evolution of synaptic conductance upon sequential tunings2

v s10 5

0.6

mS

)

+ tuning

final weights after programming

weight import accuracy  ~10%

y p p q g

+ it

read pulse write pulse0.3

0.4

0.5

wei

ght,

g (m g+ tuning g ‐

123456

gi+, i

gs2

0 20 40 60 80 100 120 250 3000.0

0.1

0.2

Syn

aptic

w

weight slightly affected by half‐select problem

678910

v

t

+Vswitch

-Vswitch

v

t

voltage at g8- 0 20 40 60 80 100 120 250 300

Pulse number #

‐ Crossbar  half‐select tricklf l d d i li h l ff d ( bi i i )switch

Alibart et al., submitted, 2012

‐ Half‐selected devices slightly affected (>5‐bit precision)

Perceptron: In‐situ training

V tra in = 1 VV tra in = 0 .9 V

s1 s2

g1+ g4

+

Evolution of synaptic conductance upon parallel tuning

‐ Four steps‐ α (V g)

∆gi ± = ±αxi(d(p)‐y(p)) 

0 05

-0 .10 .00 .1

-0 .050.00

g

g

s3s4g1

- g4-

s1=PSx=+1 voltage at g1+

‐ α (V, g)

0.000.05

-0 .050.000.000.05

g

(mS

)

g

g

g

+Vtrain/2v

t1  2   3  4

v

t

1 x=+1 

s2=PS 1

voltage at g1

voltage at g1-

-Vtrain-Vtrain/2

0 1

-0 .20

-0 .15-0 .150.000.15

g

g

g

g

v

t

v

t

s2 PSx=‐1 

s3=PS+d=+1

voltage at g1

voltage at g4+

0 00.1

-0 .15-0 .10-0 .05

0.00 .1

g

g

g

v

t

v

t

3 d=+1 g g4

voltage at g4-s4=PS‐d=+1

0 4 8 1 2 1 6

0.0

T ra in in g e p o c h

v

t

v

t+Vswitch

-Vswitch

4 d 1

Alibart et al., submitted, 2012

Results

10

XT

initialInitial (random 

XT

initial

Ex‐situ In‐situ

0

10

accuracy ~ 40%

( a doweights)

weight import accuracy ~40% 0

10T

ns

0

10

of p

atte

rns

accuracy ~ 10%

accuracy  ~40%

weight import 

10

0

ber o

f pat

tern after 10 epochs

with Vtrain =0.9V

0

10

Num

ber o

accuracy ~ 2%

accuracy  ~10%

weight import 10

0after 7 more epochs with Vtrain =1V

Num

b

0

10

accuracy 2%weight import accuracy  ~2%

-0.0002 0.0000 0.00020

10

train

-0.0002 0.0000 0.0002I+ - I- (A) I+ - I- (A)

Alibart et al., submitted, 2012‐ 3‐bit is enough for considered task

Big picture

add‐on

Ti h i i i h CMOS l i (CMOL)

CMOSstack

Tight integration with CMOS logic (CMOL)Multi‐layer perceptron network

x1

x ywj1

x1gj1

gj2

weight memristor

x3

x2 yj

wj2

wj3 x3

x2gj2

gj3

‐+

jii

i gx

CMOS CMOS cell

Spiking Networks and Spike‐Timing Dependent Plasticity (STDP)Dependent Plasticity (STDP)

Spiking vs. firing rate neural networksFiring rate (average frequency matters, high frequency  level 1, low frequency  level 0)

Spiking networks

Relative timing of h ikthe spikes matters 

Delay between neurons matters Enriches the functionality

Spiking neural networks

Spatiotemporal processing

Known to happen in biology,  d i h di i fe.g. detecting the direction of 

the sound with two sensors and two neurons 

Polychronization: Computation with kSpikes

• According to Izhikevitch: Accounting for timingAccording to Izhikevitch: Accounting for timing of spikes allows to increase the capacity of the network beyond that of Hopfield networksnetwork beyond that of Hopfield networks 

Hopfield Networks

Binary Hopfield network

])(sgn[)1(0

i

ijij tvwtv

Capacity is pmax = N/logN

Polychronization: Computation with Spikes

Due to STDP system can self‐organized to activate various polychronous groups 

Spike Timing Dependent Plasticity 

STDP Implementation (first attempt)STDP Implementation (first attempt)

“ h i l t d CMOS“… we have implemented a CMOS neuron circuit to convert the relative timing information of the neuron spikes into pulse width p pinformation seen by thememristor synapse

STDP Implementation Proposal for Memristors

Assumed rate change as a function of applied voltage

Proposal for Memristors 

STDP Implementation with PCM

Long Term Depression and Short Term PotentiatingPotentiating

Electronic Pavlov’s Dog

Snider’s Spiking NetworksSnider s Spiking Networks

Example: Network Self-Organization

(Spatial Orientation Filter Array)(Spatial Orientation Filter Array)

adaptiveadaptiverecurrentnetwork

+‐ output

xi

+ ‐‐‐ ++

49

input

G. Snider, Nanotechnology 18 365202 (2007)