SIB

java.lang.Object
- smile.clustering.PartitionClustering<double[]>
- - smile.clustering.SIB

All Implemented Interfaces:

java.io.Serializable, Clustering<double[]>
```
public class SIB
extends PartitionClustering<double[]>
```
The Sequential Information Bottleneck algorithm. SIB clusters co-occurrence data such as text documents vs words. SIB is guaranteed to converge to a local maximum of the information. Moreover, the time and space complexity are significantly improved in contrast to the agglomerative IB algorithm.
In analogy to K-Means, SIB's update formulas are essentially same as the EM algorithm for estimating finite Gaussian mixture model by replacing regular Euclidean distance with Kullback-Leibler divergence, which is clearly a better dissimilarity measure for co-occurrence data. However, the common batch updating rule (assigning all instances to nearest centroids and then updating centroids) of K-Means won't work in SIB, which has to work in a sequential way (reassigning (if better) each instance then immediately update related centroids). It might be because K-L divergence is very sensitive and the centroids may be significantly changed in each iteration in batch updating rule.
Note that this implementation has a little difference from the original paper, in which a weighted Jensen-Shannon divergence is employed as a criterion to assign a randomly-picked sample to a different cluster. However, this doesn't work well in some cases as we experienced probably because the weighted JS divergence gives too much weight to clusters which is much larger than a single sample. In this implementation, we instead use the regular/unweighted Jensen-Shannon divergence.
References
1. N. Tishby, F.C. Pereira, and W. Bialek. The information bottleneck method. 1999.
2. N. Slonim, N. Friedman, and N. Tishby. Unsupervised document classification using sequential information maximization. ACM SIGIR, 2002.
3. Jaakko Peltonen, Janne Sinkkonen, and Samuel Kaski. Sequential information bottleneck for finite data. ICML, 2004.
See Also:

Serialized Form

Field Summary
- Fields inherited from class smile.clustering.PartitionClustering
  k, size, y
- Fields inherited from interface smile.clustering.Clustering
  OUTLIER

Constructor Summary

Constructors
Constructor and Description
`SIB(double[][] data, int k)` Constructor.
`SIB(double[][] data, int k, int maxIter)` Constructor.
`SIB(double[][] data, int k, int maxIter, int runs)` Constructor.
`SIB(smile.data.SparseDataset data, int k)` Constructor.
`SIB(smile.data.SparseDataset data, int k, int maxIter)` Constructor.
`SIB(smile.data.SparseDataset data, int k, int maxIter, int runs)` Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`double[][]`	`centroids()` Returns the centroids.
`double`	`distortion()` Returns the distortion.
`int`	`predict(double[] x)` Cluster a new instance.
`int`	`predict(smile.math.SparseArray x)` Cluster a new instance.
`java.lang.String`	`toString()`

Methods inherited from class smile.clustering.PartitionClustering
getClusterLabel, getClusterSize, getNumClusters, seed, seed

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - SIB
```
public SIB(double[][] data,
           int k)
```
    Constructor. Clustering data into k clusters up to 100 iterations.
    
    Parameters:
    
    data - the normalized co-occurrence input data of which each row is a sample with sum 1.
    
    k - the number of clusters.
  - SIB
```
public SIB(double[][] data,
           int k,
           int maxIter)
```
    Constructor. Clustering data into k clusters.
    
    Parameters:
    
    data - the input data of which each row is a sample.
    
    k - the number of clusters.
    
    maxIter - the maximum number of iterations.
  - SIB
```
public SIB(double[][] data,
           int k,
           int maxIter,
           int runs)
```
    Constructor. Run SIB clustering algorithm multiple times and return the best one.
    
    Parameters:
    
    data - the input data of which each row is a sample.
    
    k - the number of clusters.
    
    maxIter - the maximum number of iterations.
    
    runs - the number of runs of SIB algorithm.
  - SIB
```
public SIB(smile.data.SparseDataset data,
           int k)
```
    Constructor. Clustering data into k clusters up to 100 iterations.
    
    Parameters:
    
    data - the sparse normalized co-occurrence dataset of which each row is a sample with sum 1.
    
    k - the number of clusters.
  - SIB
```
public SIB(smile.data.SparseDataset data,
           int k,
           int maxIter)
```
    Constructor. Clustering data into k clusters.
    
    Parameters:
    
    data - the sparse normalized co-occurrence dataset of which each row is a sample with sum 1.
    
    k - the number of clusters.
    
    maxIter - the maximum number of iterations.
  - SIB
```
public SIB(smile.data.SparseDataset data,
           int k,
           int maxIter,
           int runs)
```
    Constructor. Run SIB clustering algorithm multiple times and return the best one.
    
    Parameters:
    
    data - the sparse normalized co-occurrence dataset of which each row is a sample with sum 1.
    
    k - the number of clusters.
    
    maxIter - the maximum number of iterations.
    
    runs - the number of runs of SIB algorithm.
- Method Detail
  - predict
```
public int predict(double[] x)
```
    Cluster a new instance.
    
    Parameters:
    
    x - a new instance.
    
    Returns:
    
    the cluster label.
  - predict
```
public int predict(smile.math.SparseArray x)
```
    Cluster a new instance.
    
    Parameters:
    
    x - a new instance.
    
    Returns:
    
    the cluster label.
  - distortion
```
public double distortion()
```
    Returns the distortion.
  - centroids
```
public double[][] centroids()
```
    Returns the centroids.
  - toString
```
public java.lang.String toString()
```
    Overrides:
    
    toString in class java.lang.Object

Class SIB

References

Field Summary

Fields inherited from class smile.clustering.PartitionClustering

Fields inherited from interface smile.clustering.Clustering

Constructor Summary

Method Summary

Methods inherited from class smile.clustering.PartitionClustering

Methods inherited from class java.lang.Object

Constructor Detail

SIB

SIB

SIB

SIB

SIB

SIB

Method Detail

predict

predict

distortion

centroids

toString