Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers –...

download Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.

If you can't read please download the document

Transcript of Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers –...

  • Slide 1
  • Self-Organizing Map (SOM)
  • Slide 2
  • Unsupervised neural networks, equivalent to clustering. Two layers input and output The input layer represents the input variables. The output layer: neurons arranged in a single line (one-dimensional) or a two-dimensional grid. Main feature weights
  • Slide 3
  • Learning means adopting the weights. Each output receives inputs through the weights. weight vector has the same dimensionality as the input vector The output of each neuron is its activation weighted sum of inputs (i.e. linear activation function). 2 w 11 w 21 u = x 1 w 11 + x 2 w 21
  • Slide 4
  • The objective of learning: project high- dimensional data onto 1D or 2D output neurons. Each neuron incrementally learns to represent a cluster of data. The weights are adjusted incrementally the weights of neurons are called codebook vectors (codebooks).
  • Slide 5
  • Competitive learning The so-called competitive learning (winner- takes-all). Competitive learning will be demonstrated on simple 1D network with two inputs. Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 6
  • First, number of output neurons (i.e. clusters) must be selected. Not always known, do reasonable estimate, it is better to use more, not used neurons can be eliminated later. Then initialize weights. e.g., small random values Or randomly choose some input vectors and use their values for the weights. Then competitive learning can begin.
  • Slide 7
  • The activation for each output neuron is calculated as weighted sum of inputs. e.g., the activation of the output neuron 1 is u 1 = w 11 x 1 + w 21 x 2. Generally Activation is the dot product between input vector x and weight vector w j. Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 8
  • Dot product is not only, but also If |x| = |w j | = 1, then u j = cos. The closer these two vectors are (i.e. the smaller is), the bigger the u j is (cos 0 = 1). x w
  • Slide 9
  • Say it again, and loudly: The closer the weight and input vectors are, the bigger the neuron activation is. Dan na Hrad. A simple measure of the closeness Euclidean distance between x and w j.
  • Slide 10
  • Scale the input vector so that its length is equal to one. |x|=1 An input is presented to the network. Scale weight vectors of individual output neurons to the unit length. |w|=1 Calculate, how close the input vector x is to each weight vector w j (j is 1 # output neurons). The neuron which codebook is closest to the input vector becomes a winner (BMU, Best Matching Unit). Its weights will be updated.
  • Slide 11
  • Weight update The weight vector w is updated so that it moves closer to the input x. x w d ww learning rate
  • Slide 12
  • Recursive vs. batch learning Conceptually similar to online/batch learning Recursive learning: update weights of the winning neuron after each presentation of input vector Batch learning: note the weight update for each input vector the average weight adjustment for each output neuron is done after the whole epoch When to terminate learning? mean distance between neurons and inputs they represent is at a minimum, distance stops changing
  • Slide 13
  • Example Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 14
  • Slide 15
  • epoch Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 16
  • Slide 17
  • Slide 18
  • Topology is not preserved. Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 19
  • Meet todays hero Teuvo Kohonen
  • Slide 20
  • Self-Organizing Maps SOM, also Self-Organizing Feature Map (SOFM), Kohonen neural network. Inspired by the function of brain: Different brain regions correspond to specific aspects of human activities. These regions are organized such that tasks of similar nature (e.g. speech and vision) are controlled by regions that are in spatial proximity each to other. This is called topology preservation.
  • Slide 21
  • In SOM learning, not only the winner, but also the neighboring neurons adapt their weights. Neurons closer to the winner adjust weights more than further neurons. Thus we need 1.to define the neighborhood 2.to define a way how much neighboring neurons adapt their weights
  • Slide 22
  • Neighborhood definition neighborhood radius r 1 2 3 1 2 Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006 1 1 2 2
  • Slide 23
  • Training in SOM Follows similar manner of standard winner-takes- all competitive training. However, new rule is used for weight changes. Suppose, that the BMU is at position {i win, j win } on the 2D map. Then all codebook vectors of BMU and neighbors are adjusted to w j according to where NS is the neighbor strength varying with the distance to the BMU. is learning rate.
  • Slide 24
  • Neighbor strength When using neighbor features, all neighbor codebooks are shifted towards the input vector. However, BMU updates most, and the further away the neighbor neuron is, the less its weights update. The NS function tells us how the weight adjustment decays with distance from the winner.
  • Slide 25
  • Slide 26
  • Slide by Johan Everts
  • Slide 27
  • Linear Gaussian Exponential Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 28
  • 2D Side Effects source: http://www.cis.hut.fi/projects/somtoolbox/download/sc_shots2.shtml
  • Slide 29
  • Shrinking neighborhood size Large neighborhood proper placement of neurons in the initial stage to broadly represent spatial organization of input data. Further refinement subsequent shrinking of the neighborhood. A size of large starting neighborhood is reduced with iterations. 0 initial neighborhood size t neighborhood size at iteration t T total number of iterations bringing neighborhood to zero (i.e., only winner remains) linear decayexponential decay
  • Slide 30
  • Learning rate decay Weight update incorporating learning rate and neighborhood decay
  • Slide 31
  • Recursive/Batch Learning SOM in a batch mode with no neigborhood is equivalent to k-means The use of neighborhood leads to topology preservation Regions closer in input space are represented by neurons closer in a map.
  • Slide 32
  • Two Phases of SOM Training
  • Slide 33
  • Example contd. Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 34
  • neighborhood drops to 0 after 3 iterations Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 35
  • After 3 iterations Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006 topology preservation takes effect very quickly Complete training Converged after 40 epochs. Epochs
  • Slide 36
  • Complete training All vectors have found cluster centers Except one Solution: add one more neuron 1 2 3 6 5 4 Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 37
  • 1 2 3 6 5 4 7
  • Slide 38
  • 2D output Play with http://www.demogng.de/http://www.demogng.de/ Choose 'The self-organizing map'. Following update formulas are used: neighborhood size learning rate
  • Slide 39
  • Slide 40
  • A self-organizing feature map from a square source space to a square (grid) target space. Duda, Hart, Stork, Pattern Classification, 2000
  • Slide 41
  • Some initial (random) weights and the particular sequence of patterns (randomly chosen) lead to kinks in the map; even extensive further training does not eliminate the kink. In such cases, learning should be restarted with randomized weights and possibly a wider window function and slower decay in learning. Duda, Hart, Stork, Pattern Classification, 2000
  • Slide 42
  • 2D maps of multidimensional data Iris data set 150 patterns, 4 attributes, 3 classes (Setosa 1, Versicolor 2, Virginica 3) more than 2 dimensions, so all data can not be vizualized in a meaningful way SOM can be used not only to cluster input data, but also to explore the relationships between different attributes. SOM structure 8x8, hexagonal, exp decay of learning rate ( init = 0.5, T max = 20x150 = 3000), NS: Gaussian
  • Slide 43
  • What can be learned? petal length and width have structure similar to the class panel low length correlates with low width and these relate to the Versicolor class sepal width very different pattern class panel boundary between Virginica and Setosa classes overlap setosa versicolor virginica Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
  • Slide 44
  • Since we have class labels, we can assess the classification accuracy of the map. So first we train the map using all 150 patterns. And then we present input patterns individually again and note the winning neuron. The class to which the input belongs is the class associated with this BMU codebook vector (see previous slide, Class panel). Only the winner decides classification.
  • Slide 45
  • Vers (2) 100% accuracy Set (1) 86% Virg (3) 88% Overall accuracy = 91.3% Vers (2) 100% accuracy Set (1) 90% Virg (3) 94% Overall accuracy = 94.7% Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006 Only winner decides the classification Neighborhood of size 2 decides the classification
  • Slide 46
  • U-matrix Distances between the neighboring codebook vectors can highlight different cluster regions in the map and can be a useful visualization tool Two neurons: w 1 = {w 11, w 21, w n1 }, w 2 = {w 12, w 22, w n2 } Euclidean distance between them The average of the distance to the nearest neighbors unified distance, U -matrix
  • Slide 47
  • The larger the distance between neurons, the larger the U value and more separated the clusters. The lighter the color, the larger the U value. Large distance between this cluster (Iris versicolor) and the middle cluster (Iris setosa). Large distances between codebook vectors indicate a sharp boundary between the clusters.
  • Slide 48
  • Surface graph The height represents the distance. 3 rd row large height = separation Other two clusters are not separated.
  • Slide 49
  • Quantization error Measure of the distance between codebook vectors and inputs. If for input vector x the winner is w c, then distortion error e can be calculated as Comput e for all input vectors and get average quantization error, average map distortion error E.
  • Slide 50
  • Iris quantization error High distortion error indicates areas where the codebook vector is relatively far from the inputs. Such information can be used to refine the map to obtain a more uniform distortion error measure if a more faithful reproduction of the input distribution from the map is desired.