T
- the type of input object.public abstract class PartitionClustering<T> extends java.lang.Object implements Clustering<T>
Modifier and Type | Field and Description |
---|---|
protected int |
k
The number of clusters.
|
protected int[] |
size
The number of samples in each cluster.
|
protected int[] |
y
The cluster labels of data.
|
OUTLIER
Constructor and Description |
---|
PartitionClustering() |
Modifier and Type | Method and Description |
---|---|
int[] |
getClusterLabel()
Returns the cluster labels of data.
|
int[] |
getClusterSize()
Returns the size of clusters.
|
int |
getNumClusters()
Returns the number of clusters.
|
static <T> double |
seed(smile.math.distance.Distance<T> distance,
T[] data,
T[] medoids,
int[] y,
double[] d)
Initialize cluster membership of input objects with KMeans++ algorithm.
|
static int[] |
seed(double[][] data,
int k,
ClusteringDistance distance)
Initialize cluster membership of input objects with KMeans++ algorithm.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
predict
protected int k
protected int[] y
protected int[] size
public int getNumClusters()
public int[] getClusterLabel()
public int[] getClusterSize()
public static int[] seed(double[][] data, int k, ClusteringDistance distance)
K-Means++ is based on the intuition of spreading the k initial cluster centers away from each other. The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its distance squared to the point's closest cluster center.
The exact algorithm is as follows:
data
- data objects to be clustered.k
- the number of cluster.public static <T> double seed(smile.math.distance.Distance<T> distance, T[] data, T[] medoids, int[] y, double[] d)
K-Means++ is based on the intuition of spreading the k initial cluster centers away from each other. The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its distance squared to the point's closest cluster center.
The exact algorithm is as follows:
T
- the type of input object.data
- data objects array of size n.medoids
- an array of size k to store cluster medoids on output.y
- an array of size n to store cluster labels on output.d
- an array of size n to store the distance of each sample to nearest medoid.