public class KMeans extends PartitionClustering<double[]>
However, the k-means algorithm has at least two major theoretic shortcomings:
We also use k-d trees to speed up each k-means step as described in the filter algorithm by Kanungo, et al.
K-means is a hard clustering method, i.e. each sample is assigned to a specific cluster. In contrast, soft clustering, e.g. the Expectation-Maximization algorithm for Gaussian mixtures, assign samples to different clusters with different probabilities.
k, size, y
OUTLIER
Constructor and Description |
---|
KMeans(double[][] data,
int k)
Constructor.
|
KMeans(double[][] data,
int k,
int maxIter)
Constructor.
|
KMeans(double[][] data,
int k,
int maxIter,
int runs)
Clustering data into k clusters.
|
Modifier and Type | Method and Description |
---|---|
double[][] |
centroids()
Returns the centroids.
|
double |
distortion()
Returns the distortion.
|
static KMeans |
lloyd(double[][] data,
int k)
The implementation of Lloyd algorithm as a benchmark.
|
static KMeans |
lloyd(double[][] data,
int k,
int maxIter)
The implementation of Lloyd algorithm as a benchmark.
|
static KMeans |
lloyd(double[][] data,
int k,
int maxIter,
int runs)
The implementation of Lloyd algorithm as a benchmark.
|
int |
predict(double[] x)
Cluster a new instance.
|
java.lang.String |
toString() |
getClusterLabel, getClusterSize, getNumClusters, seed, seed
public KMeans(double[][] data, int k)
data
- the input data of which each row is a sample.k
- the number of clusters.public KMeans(double[][] data, int k, int maxIter)
data
- the input data of which each row is a sample.k
- the number of clusters.maxIter
- the maximum number of iterations for each running.public KMeans(double[][] data, int k, int maxIter, int runs)
data
- the input data of which each row is a sample.k
- the number of clusters.maxIter
- the maximum number of iterations for each running.runs
- the number of runs of K-Means algorithm.public double distortion()
public double[][] centroids()
public int predict(double[] x)
x
- a new instance.public static KMeans lloyd(double[][] data, int k)
data
- the input data of which each row is a sample.k
- the number of clusters.public static KMeans lloyd(double[][] data, int k, int maxIter)
data
- the input data of which each row is a sample.k
- the number of clusters.maxIter
- the maximum number of iterations for each running.public static KMeans lloyd(double[][] data, int k, int maxIter, int runs)
data
- the input data of which each row is a sample.k
- the number of clusters.maxIter
- the maximum number of iterations for each running.runs
- the number of runs of K-Means algorithm.public java.lang.String toString()
toString
in class java.lang.Object