PartitionClustering

java.lang.Object
- smile.clustering.PartitionClustering<T>

Type Parameters:

T - the type of input object.

All Implemented Interfaces:

java.io.Serializable, Clustering<T>

Direct Known Subclasses:

CLARANS, DBSCAN, DENCLUE, KMeans, MEC, NeuralGas, SIB
```
public abstract class PartitionClustering<T>
extends java.lang.Object
implements Clustering<T>
```
Abstract class of partition clustering. Partition methods break the observation into distinct non-overlapping groups.

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`k` The number of clusters.
`protected int[]`	`size` The number of samples in each cluster.
`protected int[]`	`y` The cluster labels of data.

Fields inherited from interface smile.clustering.Clustering
OUTLIER

Constructor Summary

Constructors
Constructor and Description

PartitionClustering()

Constructors
Constructor and Description
`PartitionClustering()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`int[]`	`getClusterLabel()` Returns the cluster labels of data.
`int[]`	`getClusterSize()` Returns the size of clusters.
`int`	`getNumClusters()` Returns the number of clusters.
`static <T> double`	`seed(smile.math.distance.Distance<T> distance, T[] data, T[] medoids, int[] y, double[] d)` Initialize cluster membership of input objects with KMeans++ algorithm.
`static int[]`	`seed(double[][] data, int k, ClusteringDistance distance)` Initialize cluster membership of input objects with KMeans++ algorithm.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface smile.clustering.Clustering
predict

- Field Detail
  - k
```
protected int k
```
    The number of clusters.
  - y
```
protected int[] y
```
    The cluster labels of data.
  - size
```
protected int[] size
```
    The number of samples in each cluster.
- Constructor Detail
  - PartitionClustering
```
public PartitionClustering()
```
- Method Detail
  - getNumClusters
```
public int getNumClusters()
```
    Returns the number of clusters.
  - getClusterLabel
```
public int[] getClusterLabel()
```
    Returns the cluster labels of data.
  - getClusterSize
```
public int[] getClusterSize()
```
    Returns the size of clusters.
  - seed
```
public static int[] seed(double[][] data,
                         int k,
                         ClusteringDistance distance)
```
    Initialize cluster membership of input objects with KMeans++ algorithm. Many clustering methods, e.g. k-means, need a initial clustering configuration as a seed.
    K-Means++ is based on the intuition of spreading the k initial cluster centers away from each other. The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its distance squared to the point's closest cluster center.
    The exact algorithm is as follows:
    1. Choose one center uniformly at random from among the data points.
    2. For each data point x, compute D(x), the distance between x and the nearest center that has already been chosen.
    3. Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D²(x).
    4. Repeat Steps 2 and 3 until k centers have been chosen.
    5. Now that the initial centers have been chosen, proceed using standard k-means clustering.
    This seeding method gives out considerable improvements in the final error of k-means. Although the initial selection in the algorithm takes extra time, the k-means part itself converges very fast after this seeding and thus the algorithm actually lowers the computation time too.
    References
    1. D. Arthur and S. Vassilvitskii. "K-means++: the advantages of careful seeding". ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007.
    2. Anna D. Peterson, Arka P. Ghosh and Ranjan Maitra. A systematic evaluation of different methods for initializing the K-means clustering algorithm. 2010.
    Parameters:
    
    data - data objects to be clustered.
    
    k - the number of cluster.
    
    Returns:
    
    the cluster labels.
  - seed
```
public static <T> double seed(smile.math.distance.Distance<T> distance,
                              T[] data,
                              T[] medoids,
                              int[] y,
                              double[] d)
```
    Initialize cluster membership of input objects with KMeans++ algorithm. Many clustering methods, e.g. k-means, need a initial clustering configuration as a seed.
    K-Means++ is based on the intuition of spreading the k initial cluster centers away from each other. The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its distance squared to the point's closest cluster center.
    The exact algorithm is as follows:
    1. Choose one center uniformly at random from among the data points.
    2. For each data point x, compute D(x), the distance between x and the nearest center that has already been chosen.
    3. Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D²(x).
    4. Repeat Steps 2 and 3 until k centers have been chosen.
    5. Now that the initial centers have been chosen, proceed using standard k-means clustering.
    This seeding method gives out considerable improvements in the final error of k-means. Although the initial selection in the algorithm takes extra time, the k-means part itself converges very fast after this seeding and thus the algorithm actually lowers the computation time too.
    References
    1. D. Arthur and S. Vassilvitskii. "K-means++: the advantages of careful seeding". ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007.
    2. Anna D. Peterson, Arka P. Ghosh and Ranjan Maitra. A systematic evaluation of different methods for initializing the K-means clustering algorithm. 2010.
    Type Parameters:
    
    T - the type of input object.
    
    Parameters:
    
    data - data objects array of size n.
    
    medoids - an array of size k to store cluster medoids on output.
    
    y - an array of size n to store cluster labels on output.
    
    d - an array of size n to store the distance of each sample to nearest medoid.
    
    Returns:
    
    the initial cluster distortion.

Class PartitionClustering<T>

Field Summary

Fields inherited from interface smile.clustering.Clustering

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface smile.clustering.Clustering

Field Detail

k

y

size

Constructor Detail

PartitionClustering

Method Detail

getNumClusters

getClusterLabel

getClusterSize

seed

References

seed

References