SOM

java.lang.Object
- smile.vq.SOM

All Implemented Interfaces:

Clustering<double[]>
```
public class SOM
extends java.lang.Object
implements Clustering<double[]>
```
Self-Organizing Map. An SOM is a unsupervised learning method to produce a low-dimensional (typically two-dimensional) discretized representation (called a map) of the input space of the training samples. The model was first described as an artificial neural network by Teuvo Kohonen, and is sometimes called a Kohonen map.
While it is typical to consider SOMs as related to feed-forward networks where the nodes are visualized as being attached, this type of architecture is fundamentally different in arrangement and motivation because SOMs use a neighborhood function to preserve the topological properties of the input space. This makes SOMs useful for visualizing low-dimensional views of high-dimensional data, akin to multidimensional scaling.
SOMs belong to a large family of competitive learning process and vector quantization. An SOM consists of components called nodes or neurons. Associated with each node is a weight vector of the same dimension as the input data vectors and a position in the map space. The usual arrangement of nodes is a regular spacing in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher dimensional input space to a lower dimensional map space. During the (iterative) learning, the input vectors are compared to the weight vector of each neuron. Neurons who most closely match the input are known as the best match unit (BMU) of the system. The weight vector of the BMU and those of nearby neurons are adjusted to be closer to the input vector by a certain step size.
There are two ways to interpret a SOM. Because in the training phase weights of the whole neighborhood are moved in the same direction, similar items tend to excite adjacent neurons. Therefore, SOM forms a semantic map where similar samples are mapped close together and dissimilar apart. The other way is to think of neuronal weights as pointers to the input space. They form a discrete approximation of the distribution of training samples. More neurons point to regions with high training sample concentration and fewer where the samples are scarce.
SOM may be considered a nonlinear generalization of Principal components analysis (PCA). It has been shown, using both artificial and real geophysical data, that SOM has many advantages over the conventional feature extraction methods such as Empirical Orthogonal Functions (EOF) or PCA.
It has been shown that while SOMs with a small number of nodes behave in a way that is similar to K-means. However, larger SOMs rearrange data in a way that is fundamentally topological in character and display properties which are emergent. Therefore, large maps are preferable to smaller ones. In maps consisting of thousands of nodes, it is possible to perform cluster operations on the map itself.
A common way to display SOMs is the heat map of U-matrix. The U-matrix value of a particular node is the minimum/maximum/average distance between the node and its closest neighbors. In a rectangular grid for instance, we might consider the closest 4 or 8 nodes.
References
1. Teuvo KohonenDan. Self-organizing maps. Springer, 3rd edition, 2000.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class SOM.Neuron
Self-Organizing Map Neuron.

Nested Classes
Modifier and Type	Class and Description
`static class`	`SOM.Neuron` Self-Organizing Map Neuron.

Field Summary
- Fields inherited from interface smile.clustering.Clustering
  OUTLIER

Constructor Summary

Constructors
Constructor and Description

SOM(double[][] data, int size)
Constructor.

SOM(double[][] data, int width, int height)
Constructor.

Constructors
Constructor and Description
`SOM(double[][] data, int size)` Constructor.
`SOM(double[][] data, int width, int height)` Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`int[][]`	`bmu()` Returns the best matched unit for each sample.
`int[][]`	`getClusterLabel()` Returns the cluster labels for each neuron.
`double[][][]`	`map()` Returns the SOM map grid.
`int[]`	`partition(int k)` Clustering the neurons into k groups.
`int`	`predict(double[] x)` Cluster a new instance to the nearest neuron.
`int[][]`	`size()` Returns the number of samples in each unit.
`double[][]`	`umatrix()` Returns the U-Matrix of SOM map for visualization.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - SOM
```
public SOM(double[][] data,
           int size)
```
    Constructor. Learn the SOM of given data.
    
    Parameters:
    
    size - the size of a squared map.
  - SOM
```
public SOM(double[][] data,
           int width,
           int height)
```
    Constructor. Learn the SOM of given data.
    
    Parameters:
    
    width - the width of map.
    
    height - the height of map.
- Method Detail
  - map
```
public double[][][] map()
```
    Returns the SOM map grid.
  - umatrix
```
public double[][] umatrix()
```
    Returns the U-Matrix of SOM map for visualization.
  - bmu
```
public int[][] bmu()
```
    Returns the best matched unit for each sample.
    
    Returns:
    
    the best matched unit. This is n-by-2 matrix, of which each row is for each data point. The entry bmu[i][0] and bmu[i][1] are the row index and column index of the best matched unit for each sample, respectively.
  - size
```
public int[][] size()
```
    Returns the number of samples in each unit.
  - getClusterLabel
```
public int[][] getClusterLabel()
```
    Returns the cluster labels for each neuron. If the neurons have not been clustered, throws an Illegal State Exception.
  - partition
```
public int[] partition(int k)
```
    Clustering the neurons into k groups. And then assigns the samples in each neuron to the corresponding cluster.
    
    Parameters:
    
    k - the number of clusters.
    
    Returns:
    
    the cluster label of samples.
  - predict
```
public int predict(double[] x)
```
    Cluster a new instance to the nearest neuron. For clustering purpose, one should build a sufficient large map to capture the structure of data space. Then the neurons of map can be clustered into a small number of clusters. Finally the sample should be assign to the cluster of its nearest neurons.
    
    Specified by:
    
    predict in interface Clustering<double[]>
    
    Parameters:
    
    x - a new instance.
    
    Returns:
    
    the cluster label. If the method partition(int) is called before, this is the cluster label of the nearest neuron. Otherwise, this is the index of neuron (i * width + j).

Class SOM

References

Nested Class Summary

Field Summary

Fields inherited from interface smile.clustering.Clustering

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

SOM

SOM

Method Detail

map

umatrix

bmu

size

getClusterLabel

partition

predict