org.apache.commons.math.stat.clustering
Class KMeansPlusPlusClusterer<T extends Clusterable<T>>

java.lang.Object
  extended by org.apache.commons.math.stat.clustering.KMeansPlusPlusClusterer<T>
Type Parameters:
T - type of the points to cluster

public class KMeansPlusPlusClusterer<T extends Clusterable<T>>
extends Object

Clustering algorithm based on David Arthur and Sergei Vassilvitski k-means++ algorithm.

Since:
2.0
Version:
$Revision: 1054333 $ $Date: 2011-01-02 01:34:58 +0100 (dim. 02 janv. 2011) $
See Also:
K-means++ (wikipedia)

Nested Class Summary
static class KMeansPlusPlusClusterer.EmptyClusterStrategy
          Strategies to use for replacing an empty cluster.
 
Constructor Summary
KMeansPlusPlusClusterer(Random random)
          Build a clusterer.
KMeansPlusPlusClusterer(Random random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
          Build a clusterer.
 
Method Summary
 List<Cluster<T>> cluster(Collection<T> points, int k, int maxIterations)
          Runs the K-means++ clustering algorithm.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(Random random)
Build a clusterer.

The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.

Parameters:
random - random generator to use for choosing initial centers

KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(Random random,
                               KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer.

Parameters:
random - random generator to use for choosing initial centers
emptyStrategy - strategy to use for handling empty clusters that may appear during algorithm iterations
Since:
2.2
Method Detail

cluster

public List<Cluster<T>> cluster(Collection<T> points,
                                int k,
                                int maxIterations)
Runs the K-means++ clustering algorithm.

Parameters:
points - the points to cluster
k - the number of clusters to split the data into
maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used
Returns:
a list of clusters containing the points


Copyright © 2003-2011 The Apache Software Foundation. All Rights Reserved.