Modeling and Visualization of High Dimensional...

18
Self-Organizing Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data Additional reading can be found from non-assessed exercises (week 9) in this course unit teaching page. Textbook: Ch. 9 in [3]

Transcript of Modeling and Visualization of High Dimensional...

Page 1: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

Self-Organizing Maps (SOM)

COMP61021 Modelling and Visualization of High Dimensional Data

Additional reading can be found from non-assessed exercises (week 9) in this course unit teaching page.

Textbook: Ch. 9 in [3]

Page 2: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data2

Outline• Introduction • Kohonen SOM• Learning Algorithm• Visualization Method• Examples• Relevant Issues• Conclusions

Page 3: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data3

Introduction• Self-organizing maps (SOM)

– SOM is a biologically inspired unsupervised neural network that approximates an unlimited number of input data by a finite set of nodes arranged in a low-dimensional grid, where neighbor nodes correspond to more similar input data.

– The model is produced by a learning algorithm that automatically orders the inputs on a one or two-dimensional grid according to their mutual similarity.

– Useful for clustering analysis and data visualization

Input space Initial weights Final weights

Page 4: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data4

Kohonen SOM

Competition

Tww ),( 21=w

Txx ),( 21=x

)()(),( xwxwxwxw −−=−= TEd

2 .,. =Nge

hard-wiredconnection

Page 5: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data5

Kohonen SOM

Cooperation

2 :radius"" =ikd

Page 6: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data6

Kohonen SOM

Adaptation

(see the algorithm on the following slides for details)

Page 7: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data7

Learning Algorithm

Page 8: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data8

Learning Algorithm

Page 9: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data9

Visualization Method

• In 2D/3D dimensional space, neurons are visualized as changing positions in the weight space as learning takes place. Each neuron is described by the corresponding weight vector.

• Two neurons are connected by an edge if they are direct neighbors in the neural network lattice. For 2-D/3-D data, the lattice via weights can be displayed in the original data space.

• The locations specified by weight vectors of neurons in a grid mimic the distribution of the training data.

Page 10: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data10

Visualization Method

Page 11: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data11

Visualization Method• Example: U-Matrix

Page 12: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data12

Examples• Example 1: 1-D self-organizing map

Page 13: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data13

Examples• Example 2: 2-D self-organizing map

Page 14: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data14

Examples• Example 3: self-organizing map of synthetic data sets

After convergence of SOM learning, we achieve SOMs for different data distributions

The grid mimics the data distribution!

Page 15: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data15

Examples• Example 4: Taxonomy of animals

A grouping with SOM according to similarity has emerged

Animal names and their attributes

birds

peaceful

is

has

likesto

Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0

Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 Big 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1

2 legs 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 4 legs 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hair 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1

Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0

Feathers 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 Hunt 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0

Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0

hunters

Page 16: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data16

Relevant Issues

Page 17: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data17

Relevant Issues• SOM extension

– PSOM: continuous projection: interpolation between centroid locations via parameterisation

– disSOM: SOM working on distance between objects; more general than distance Nonnegative Matrix Factorization

– Hierarchical SOM: extension from single to multiple layers for multi-scale data analysis

– Generative topographic map (GTM): a probabilistic counterpart of the SOM and is provably convergent and does not require a shrinking neighborhood or a decreasing step size.

– Kernel SOM: overcome two major limitations of Kohonen SOM

Page 18: Modeling and Visualization of High Dimensional Datasyllabus.cs.manchester.ac.uk/pgt/COMP61021/lectures/SOM.pdf · COMP61021 Modelling and Visualization of High Dimensional Data. 3.

COMP61021 Modelling and Visualization of High Dimensional Data18

Conclusions• Kohonen SOM is a biologically inspired neural network for

high dimensional data clustering and visualisation.• Its most important property is topology preservation.• Learning gets involved in two phases: order vs. convergence• It is no guarantee that SOM is always convergent and hence

the parameter tuning is needed. • There are several variants or extensions, which tends to

overcome the limitations of the SOM.• There are a number of successful applications of SOM.