Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data...

40
Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine learning (for collective intelligence) III text processing IV genetic programming V neural networks Script, version 1.4 fall 2010

Transcript of Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data...

Page 1: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Collective Intelligence

a gentle introduction to data analysis for media arts practices

topics

I applied statisticsII applied machine learning (for collective intelligence)III text processingIV genetic programmingV neural networks

Script, version 1.4 fall 2010

Page 2: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Sources

textbooks:

- Beale+Jackson, Neural Computing- Hertz, Krough, Palmer, Introduction to the Theory of Neural Computation- Mehrotra, Mohan, Ranka, Elements of Artificial Neural Networks- Segaram, Collective Intelligence (O'Reilly)

online:

overview-Pfeifer, Neural Nets

http://www.ifi.unizh.ch/ailab/teaching/NN2007/script/nn230606.pdf-Rojas, Neural Networks – A Systematic Introduction

http://page.mi.fu-berlin.de/rojas/neural/index.html

intro + backprophttp://www-128.ibm.com/developerworks/library/l-neural/http://www.ibm.com/developerworks/library/l-neural/

forward prophttp://ffnet.sourceforge.net/

hopfield netshttp://www-128.ibm.com/developerworks/library/l-neurnet/?ca=dgr-lnxw961NeuralNet

SOM (Self Organizing Maps)http://www.len.ro/work/ai/som-neural-networkshttp://en.wikipedia.org/wiki/Self-organizing_map

Page 3: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Applets

Backpropagation NN for optical character recognition:http://www.sund.de/netze/applets/BPN/bpn2/ochre.html

Backpropagation NN for currency relations prediction:http://www.obitko.com/tutorials/neural-network-prediction/forex-prediction.html

Decision tree to recognize whales:http://www.myacquire.com/aiinc/whalewatcher/

Neural nets to with real time feedback to control an object:http://neuron.eng.wayne.edu/bpBallBalancing/ball5.html

Hopfield net for associative memory:http://lcn.epfl.ch/tutorial/english/hopfield/html/index.html

3D Kohonen feature map:http://fbim.fh-regensburg.de/~saj39122/jfroehl/diplom/e-sample.html

Also:

UCI machine learning respository (data sets)http://archive.ics.uci.edu/ml/

UCI machine learning for recognition of mushrooms:http://archive.ics.uci.edu/ml/datasets/Mushroom

Page 4: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Online resources, continued

neural nets as part of a machine learning packagehttp://montepython.sourceforge.net/

neural nets litehttp://annevolve.sourceforge.net/index.php

neural nets with matlab or octave (http://www.gnu.org/software/octave/)http://www.ncrg.aston.ac.uk/netlab/

forecasting with artificial neural networkshttp://www.neural-forecasting.com/tutorials.htm

Stuttgart Neural Network Simulatorhttp://www-ra.informatik.uni-tuebingen.de/SNNS/

University level courses on NN

Cambridgehttp://www.inference.phy.cam.ac.uk/mackay/itprnn/p0.html

Uni Zurichhttp://www.ifi.unizh.ch/ailab/teaching/NN2007/

otherhttp://www.casresearch.com/

Page 5: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

There are several ways to compute... (Rojas)

Page 6: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

The interest in neural nets is fueled by the interest in creating artificial intelligence onpar with (or exceeding) human intelligence.

Charles Babbage- Analytical Engine -

Calculate general formulas under the control of a looping program stored on punch cards (1834)

Page 7: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

The main limitations for digital computers in the simulation of biological processes are theextreme temporal and spatial resolution demanded by some biological processes, and thelimitations of the algorithms that are used to model biological processes.

EDSAC - Electronic Delay Storage Automatic Calculator -

Maurice Wilkes - The first general purpose stored program computer - An example of the von Neuman Architecture in which data and instructions share a common data path. -University of Cambridge (1949)

Intel Core 2 Duo Xeon3 Ghz clock, 820'000'000 transistors, 45nm manufacturing technology(2007)

Page 8: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Neural networks are modelled after biological networks (to the extent that they are understood).Neural networks are synthetic and often called artificial neural nets, to distinguish them from theirbiological counterparts.

The Blue Brain Project is a short overview of the state of the art of neural computing (hardware)

By exploiting the computing power of Blue Gene, the Blue Brain Project aims to build accurate models of the mammalian brain from first principles. The first phase of the project is to build a cellular-level (as opposed to a genetic- or molecular-level) model of a 2-week-old rat somatosensory neocortex corresponding to the dimensions of a neocortical column (NCC) as defined by the dendritic arborizations of the layer 5 pyramidal neurons

The statistical variations within each electrical class are also used to generate subtle variations in discharge behaviour in each neuron. So, each neuron is morphologically and electrically unique.

One neuron is then mapped onto each processor and the axonal delays are used to manage communication between neurons and processors. Effectively, processors are converted into neurons, and MPI (message-passing interface)- based communication cables are converted into axons interconnecting the neurons — so the entire Blue Gene is essentially converted into a neocortical microcircuit.

Page 9: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine
Page 10: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Differences between biological brains and computers, I (Pfeifer)

Parallelism. Computers function, in essence, in a sequential manner, whereas brains are massivelyparallel. Moreover, the individual neurons are densely connected to other neurons: a neuron hasbetween just a few and 10,000 connections. The human brain has roughly 1011 neurons and 1014

synapses, whereas modern computers, even parallel supercomputers, typically have no more than 1000parallel processors. In addition, the individual ‖processing units‖ in the brain are relatively simple andvery slow, whereas the processing units of computers are extremely sophisticated and fast (cycle timesin the range of nanoseconds).

This point is illustrated by the ‖100 step constraint‖. If a subject in a reaction time task is asked to press abutton as soon as he or she has recognized a letter, say ‖A‖, this lasts roughly 1/2s. If we assume thatthe ‖operating cycle of a cognitive operation is on the order of 5-10ms, this yields a maximum of 200operations per second. How is it possible that recognition can be achieved with only 200 cycles? Themassive parallelism and the high connectivity of neural systems appears to be a core factor.

Page 11: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Differences between biological brains and computers, II (Pfeifer)

Graceful degradation is a property of natural systems that modern computers lack to a large extentunless it is explicitly provided for. The term is used to designate systems that still operate - at leastpartially - if certain parts malfunction or if the situation changes in unexpected ways.

Noise tolerance means that if there is noise in the data or inside the system the function is not impaired,at least not significantly. The same holds for fault tolerance: If certain parts malfunction, the system doesnot grind to a halt, but continues to work.

Ability to learn. In fact, most natural systems learn continuously, as soon as there is a change in theenvironment. For humans it is impossible not to learn. Neural networks are particularly interestinglearning systems because they are massively parallel and distributed. Along with the ability to learn goesthe ability to forget. Natural systems do forget whereas computer don’t. Forgetting can be beneficial forthe functioning of the organism: avoiding overload and unnecessary detail, generalization, forgettingundesirable experiences, etc.

Page 12: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Differences between biological brains and computers, III

Learning always goes together with memory. The organization of memory in a computer iscompletely different from the one in the brain. Computer memories are accessed via addresses, thereis a separation of program and data, and items once stored, are never forgotten, unless they areoverwritten for some reason. Brains, by contrast, do not have ‖addresses‖, there is no separation of‖programs‖ and ‖data‖, and, as mentioned above, they have a tendency to forget. When natural brainssearch for memories, they use an organizational principle which is called ‖associative memory‖ or‖content-addressable memory‖: access is via part of the information searched for, not through anaddress.

The Paradox of the expert provides another illustration of the difference between brains andcomputers. Traditional thinking suggests: the larger the database, i.e. the more comprehensive anindividual’s knowledge, the longer it takes to retrieve one particular item. This is certainly the case fordatabase systems and knowledge-based systems. In human experts, the precise opposite seems tobe the case: the more someone knows, the faster he or she can actually reproduce the requiredinformation. The parallelism and the high connectivity of natural neural systems are important factorsunderlying this amazing feat.

Context effects and constraint satisfaction. Naturally intelligent systems all have the ability to takecontext into account. This is illustrated in Figure 1 where the center letter is identical for both words,but we naturally, without much reflection, identify the on in the first word as an ‖H‖, and the one in thesecond word as an ‖A‖. The adjacent letters, which in this case form the context, provide thenecessary constraints on the kinds of letters that are most like to appear in this context. Inunderstanding everyday natural language, context is also essential: if we understand the socialsituation in which an utterance is made, it is much easier to understand it, than out of context.

Page 13: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Neural nets offer a different model of computing – a biologically inspired model

―The explanation of important aspects of the physiology of neurons set the stage for the formulation of artificial neural network models which do not operate sequentially, as Turing machines do. Neural networks have a hierarchical multilayered structure which sets them apart from cellular automata, so that information is transmitted not only to the immediate neighbors but also to more distant units. In artificial neural networks one can connect each unit to any other. In contrast to conventional computers, no program is handed over to the hardware – such a program has to be created, that is, the freeparameters of the network have to be found adaptively.‖ (Rojas)

Consequence: one needs to model 'the brain' – in as many details as necessary...

Page 14: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Illustrations of neural cell activity (Rojas)

Page 15: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Action potential illustrated (Rojas p17)

Page 16: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Biological Neuron and its abstraction (McCulloch-Pitts model; illustrations from Jackson and Hertz)

Page 17: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

THE FIVE BASIC CHARACTERISTICS (Pfeifer)

(1) The characteristics of the node. We use the terms nodes, units, processing elements,neurons, and model neurons synonymously. We have to define the way in which the node sumsthe inputs, how they are transformed into level of activation, how this level of activation isupdated, and how it is transformed into an output which is transmitted along the axon.

(2) The connectivity. It must be specified which nodes are connected to which and in whatdirection.

(3) The propagation rule. It must be specified how a given activation that is traveling along anaxon, is transmitted to the neurons to which it is connected.

(4) The learning rules. It must be specified how the strengths of the connections between theneurons change over time.

(5) Embedding the network in the physical system: If we are interested in neural networks forembedded systems we must always specify how the network is embedded, i.e. how it isconnected to the sensors and the motor components.

Page 18: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

ai: activation level,hi: summed weighted input into the node (from other nodes),oi: output of node (often identical with ai),wij : weights connecting nodes j to node i.i: inputs into the network, or outputs from other nodes.

Node characteristics

Page 19: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Activation

Page 20: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

0 0 0.8 0 00 0 0 0 00.7 0.4 0 0 01.0 -0.5 0 0 00.6 0.9 0 0 0

the connection matrix (with specific weights)

w11 w12 w13 w14 w15w21 w22 w23 w24 w25w31 w32 w33 w34 w35w41 w42 w43 w44 w45w51 w52 w53 w54 w55

the connection matrix (2 inputs, 3 outputs)

Connectivity

Page 21: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Propagation rules

Page 22: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Supervised learningIn the narrow technical sense supervised means the following. If for a certain input the correspondingoutput is known, the network is to learn the mapping from inputs to outputs. In supervised learningapplications, the correct output must be known and provided to the learning algorithm. The task of thenetwork is to find the mapping. The weights are changed depending on the magnitude of the error thatthe network produces at the output layer: the larger the error, i.e. the discrepancy between the outputthat the network produces and the correct output value, the more the weights change.

Reinforcement learningIf the teacher only tells a student whether her answer is correct or not, but leaves the task of determiningwhy the answer is correct or false to the student, we have an instance of reinforcement learning. Theproblem of attributing the error (or the success) to the right cause is called the credit assignment orblame assignment problem. It is fundamental to many learning theories. Reinforcement learning is alsoused to designate learning where a particular behavior is to be reinforced. Typically, the robot receives apositive reinforcement signal if the result was good, no reinforcement or a negative reinforcement signalif it was bad. If the robot has managed to pick up an object, has found its way through a maze, or if it hasmanaged to shoot the ball into the goal, it will get a positive reinforcement.

Unsupervised learningIn unsupervised learning, there is no a priori output and no information is given from an outside observer to advise or correct. Rather, a model is built and adjusted so that it fits the observed data. Mainly two categories of learning rules fall under this heading: Hebbian learning and Kohonen networks.

Learning

Page 23: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Embedding the network

Swarm robot controller with a neural networkhttp://www5.epfl.ch/swis/page2515.htmldesign details: http://infoscience.epfl.ch/search.py?recid=50239

Page 24: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

GA and Neural Networks

GAs can be used to evolve aspects of neural networks.

Evolvable aspects include:

- initial weights

- network architecture (number of inputs, outputs

- learning rule

input pattern

output pattern

weights

activation

back propagation (weight correction)

Schematic diagram of a simple feedforward network

corrections from 'supervisor'

Page 25: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Classic perceptron (Rojas) Weights of a perceptron (Rojas)

Perceptrons are weighted threshold elements - More formally (as in wikipedia) ―The perceptron is a typeof artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. Itcan be seen as the simplest kind of feedforward neural network: a linear classifier.‖

Limitations of Perceptrons:No diameter limited perceptron can decide whether a geometric figure is connected or not.

Proof:http://page.mi.fu-berlin.de/rojas/neural/chapter/K3.pdf

Page 26: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

E1 E2

w1 w2 w0

E0

O (desired or actual)

t E0 E1 E2 desired output w0 w1 w2 Sum(wjEj) dw0 dw1 dw2 actual output

0 1 0 0 0 -0.3 0.8 0.6 -0.3 0 0 0 0

1

2

3

4

Desired behavior: AND logic0 0 > 00 1 > 01 0 > 01 1 > 1

perceptron learning rule{(1,1} {(0,0), (0,1), (1,0)}

if wT E > 0 then ok if wT E > 0if wT E <= 0 then w

i(t) = w(t-1) - dw

i

then wi(t) = w(t-1) + dw

i

Single Layer Perceptronexample

start

O g wj Ej w E g w1 , w2 ,w3E1

E2

E3

Page 27: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

t E0 E1 E2 desired output w0 w1 w2 Sum(wjEj) dw0 dw1 dw2 actual output

0 1 0 0 0 -0.3 0.8 0.6 -0.3 0 0 0 0

1

2

3

4

t E0 E1 E2 desired output w0 w1 w2 Sum(wjEj) dw0 dw1 dw2 actual output

0 1 0 0 0 -0.3 0.8 0.6 -0.3 0 0 0 0

1 1 0 1 0 -0.3 0.8 0.6 0.3 -1 0 -1 1

2

3

4

t E0 E1 E2 desired output w0 w1 w2 Sum(wjEj) dw0 dw1 dw2 actual output

0 1 0 0 0 -0.3 0.8 0.6 -0.3 0 0 0 0

1 1 0 1 0 -0.3 0.8 0.6 0.3 -1 0 -1 1

2 1 1 0 0 -1.3 0.8 -0.4 -0.5 0 0 0 0

3

4

t E0 E1 E2 desired output w0 w1 w2 Sum(wjEj) dw0 dw1 dw2 actual output

0 1 0 0 0 -0.3 0.8 0.6 -0.3 0 0 0 0

1 1 0 1 0 -0.3 0.8 0.6 0.3 -1 0 -1 1

2 1 1 0 0 -1.3 0.8 -0.4 -0.5 0 0 0 0

3 1 1 1 1 -1.3 0.8 -0.4 -0.9 1 1 1 0

4

t E0 E1 E2 desired output w0 w1 w2 Sum(wjEj) dw0 dw1 dw2 actual output

0 1 0 0 0 -0.3 0.8 0.6 -0.3 0 0 0 0

1 1 0 1 0 -0.3 0.8 0.6 0.3 -1 0 -1 1

2 1 1 0 0 -1.3 0.8 -0.4 -0.5 0 0 0 0

3 1 1 1 1 -1.3 0.8 -0.4 -0.9 1 1 1 0

4 1 0 0 0 -0.3 1.8 0.6 -0.3 0 0 0 0

Page 28: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Single layer perceptron networks can not solve linearly inseparable problems.

linearly separable linearly inseparable

The XOR function is a linearly inseparable function: (There is no 'line' that can separate the two classes (red and green))

00 001 110 111 0

output 1output 0

output 0output 1

Page 29: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Many classes of problems are not linearly separable. A new model is required to deal with this. It is called the multilayer perceptron.

The multilayer perceptron also has a different thresholding function than the threshold function of the single layer perceptron. It uses the sigmoid function.

input layer hidden layer output layer

sigmoid activation function:g(h) = [1 + exp(−2βh)]−1 = 1 / (1+e−2βh)

Page 30: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

The multilayer perceptron also has a different learning rule than the single layer perceptron. It iscalled the 'generalized delta rule' or the backpropagation rule.

Δwij

= η δiy

j

The idea is to define an error function(δ) that represents the difference between the network'scurrent output (before the final result has been obtained) and the correct output that is desired.Comparison with the desired response enables the connections between the nodes (the weights)to be altered so that the network can produce a more accurate response on the subsequentiteration. In order to achieve the desired output, the error function should decrease from oneiteration to the next. This is achieved by adjusting the weights on the links between the units; thedelta rule calculates the value of the error function of that particular input, and then back-propagates the error from one layer to the previous one. Each unit in the net has its weightsadjusted so that the overall error is decreased.

Because one needs to know the correct pattern, the backpropagation rule is a supervisedlearning technique.

Page 31: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

The derivation of the learning rule for multilayer perceptrons is somewhat involved because the learningoccurs on hidden and output layers. The interested reader is referred to Rojas (p.151) and Peifer (p.26ff)for the mathematical details.Here is an example that shows how the learning rules are applied for a simple multiple layer perceptron

Oo

Oh

bias

X1 X2

w3

w4

w0 w1 w2

output layer

hidden layer

input layer

Logic: OR

0 0 00 1 11 0 11 1 1

Activation: Sigmoidg(h) = 1 / (1+e-2βh)β = 0.5Learning rate ηη = 1

Learning rule

A) hidden layer: output Oh = g(Σw

ix

i)

delta δi

= yi(1-y

i)Σδ

kw

ik = i+1

B) output layeroutput Oo = g(Σw

kv

k)

delta δi

= yi(1-y

i)(d

i– y

i) d: desired

y: actual

C) delta weightsΔw = η δ

jy

i

δ4

δ3

Page 32: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

It is best to setup a table view of all the nodes and weights and change them on together at each time step. The bold numbers are given at the start of the sequence.

T W0 W1 W2 W3 W4 X1 X2 Oh Oo Δw4 Δw3 Δw2 Δw1 Δw0

0 0.2000 -0.1000 0.0000 0.1000 0.2000 0 1 0.5490 0.5630 0.1074 0.0590 0.0027 0.0000 0.0027

1 0.2021 -0.1000 0.0027 0.1550 0.3074 1 0

Hidden node: Oh = g(Σwix

i) = g(x0w0 + 0+ x2w2) = g(0,2) = 0.549

Output node: Oo = g(Σwkv

k)= g(Ohw3 + x0w4) = g(0.549) = 0.563

Deltas:δ

4 = δOo = Oo(1-Oo)(1-Oo) = 0.563 (1-0.563)(1-0.563) = 0.1075

δ3= δOh = Oh(1-Oh)Σδ

4w3 = 0.549(1-0.549)0.1075 = 0.00265

Weight changeΔw

ij= ήδ

iy

jΔw0 = 1δ

31 = 0.00265

Δw1 = 1δ30 = 0

Δw2= 1δ31 = 0.00265

Δw3= 1 δ3

Oh = 0.05904Δw4= 1 δ

4 bias = 0.10739

New weightsw

i(t+1) = w(t)+Δw

iw0(t+1) = w0(t) + Δw

0= 0.2 + 0.00265 = 0.20265

w1(t+1) = w1(t) + Δw1

= -0.1 + 0 = -0.1w2(t+1) = w2(t) + Δw

2= 0 + 0.00265 = 0.20265

w3(t+1) = w3(t) + Δw3

= 0.1 + 0.055 = 0.155w4(t+1) = w4(t) + Δw

4= 0.2 + 0.1075 = 0.3074

Page 33: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine
Page 34: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Back propagation implemented in python (changes to training function of the bpnn.py code example – see class website)

class NN:def update(self, inputs)def backPropagate(self, targets, N, M)def test(self, patterns)

def main():# Teach network a patterntrain_pat = [

[[0,0,0], [1]],[[0,0,1], [0]],[[0,1,0], [0]],[[0,1,1], [0]],[[1,0,0], [0]],[[1,0,1], [0]],[[1,1,0], [0]],[[1,1,1], [1]]

]test_pat = [[[0,0,0.99]]] #test the network on a new pattern

# create a network with 3 input, 1 hidden layers (2 nodes), and 1 output (based on training data)n = NN(3, 2, 1)print "training.."n.train(train_pat)print "testing.."n.test(test_pat)

Page 35: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Hopfield networks

Hopfield networks are auto-associative recurrent memory networks. The back-propagation networksintroduced so far have only retained information about the past in terms of their weights, which havebeen changed according to the learning rules. Recurrent networks also retain information about the pastin terms of activation levels: activation is preserved for a certain amount of time: it is ‖passed back‖through the recurrent connections.

One important type of recurrent network is the Hopfield net. Associative memories - also called content-addressable memories - can be realized with Hopfield nets. Content addressable memories are capableof reconstructing patterns even if only a small portion of the entire pattern is available.

The Hopfield net consists of a number of nodes, each connected to every other node. It is a fullyconnected network and symmetrically weighted.

3

1 2

w21

w12

w13

w31

w32

w23

symmetric weights: wuv

= wvu

wu u

= 0

activation: signum function

-1

+10

Page 36: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

3

1 2

w21

w12

w13

w31

w32

w23

weight matrix:

0 w12

w13

w21

0 w23

w31

w32

0

Each node has, like the single-layer perceptron, a threshold and a step-function. The nodes calculatethe weighted sum of their inputs minus the threshold value, passing that through the step function todetermine their output state. The net takes only 2-state inputs, binary (0,1) or bipolar (-1, 1). Whatdistinguishes the Hopfield net from other networks is the way in which it produces a solution.

Inputs to the network are applied to all nodes at once and consist of a set of starting values (-1, +1).The network then cycles through various states and attempts to converge on a stable solution whichoccurs when the nodes no longer change. The output of the network is then the value of all thenodes when the network has reached this stable, steady state (the best compromise the nodes – allinfluencing each other – can reach).

This is different than the approach in the multilayer perceptron model where inputs are given and thenetwork produces and output (the solution). In the Hopfield net, the first output is taken as the newinput, which in turn produces another output, etc.

Page 37: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

3

1 2

w21

w12

w1

3

w3

1

w3

2 w2

3

Hopfield learning rule:

Si (t+1) = sgn(Σ w

ijS

j(t))

Weights (n2 – n in an n-node network):

wij

= 1/n (EjE

i) //one pattern

wij

= 1/n (ΣpEц

jEц

i) //p patterns

Stability is achieved when:

Si (t+1) = S

i(t) //across 'two epochs'

The weights between the nodes are initialized from an example pattern, associating each pattern withitself (the teaching stage). The output of the net is then forced to match that of an imposed unknownpattern at time zero. The net is then allowed to iterate freely in discrete time steps, until it reaches astable situation – when the output remains unchanged (recognition stage). The auto-association ofpatterns means that the presentation of a 'corrupt' input pattern will result in the reproduction of theperfect pattern as the output – the network is said to act as a content addressable memory.

Page 38: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Hopfield net example

weights: w12

= w21

= 0.5S1 S2

w21

w12

time step synchronous synchronous synchronous synchronous A-synchronous A-synchronous

S1 S2 S1 S2 S1 S2

0 -1 1 1 1 1 1

1 -1 1 -1 -1 -1 1

2 s t a b l e 1 1 -1 1

3 -1 -1 -1 1

U N - s t a b l e s t a b l e

Results can be dependent on the method of updating. Synchronous updates can lead to unstable / oscillating states. A-synchronous updates result in stable states.

Hopfield nets are often described in terms of an energy landscape. The energy landscape has valleys (minima) that represent the patterns stored in the network. An unknown pattern represents a particular point in the energy landscape; as the network iterates its way to a solution, the point moves through the landscape towards one of the valleys.The Hopfield net's energy function can be formulated as (Pfeifer p57):

E = -1/2 Σij(w

ijS

iS

j)

Storing a pattern is then equivalent to minimizing the energy function of that particular pattern – italways decreases and then finally occupies a valley point in the energy landscape. Storage is limited ifother existing patterns are not to be overwritten. The number of storable patterns in a Hopfield net isonly about ~0.15N, where N is the number of nodes (see Pfeifer p.52 for details).

> 0 ->Si= +1

Σwij

Si

= 0 ->Siunchanged

< 0 ->Si= -1

learning applied to this two node net:S

1(t+1)= sgn(W

12*S

2(t))

Page 39: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

Hopfield net application example (Beale p. 143ff)

Page 40: Collective Intelligence - RealTechSupport · Collective Intelligence a gentle introduction to data analysis for media arts practices topics I applied statistics II applied machine

The Hopfield (as other networks) need not be implemented exclusively in a digital computer. Here is an example of a Hopfield net implemented in an optical system:

(Rojas, p.377)

The amount of light that goes through the mask is proportional to the product of xi and wij at each position ij of the mask.

The incoming light at the j-th detector represents the total excitation of unit j