public class BIRCH extends Object implements Clustering<double[]>
BIRCH has several advantages. For example, each clustering decision is made without scanning all data points and currently existing clusters. It exploits the observation that data space is not usually uniformly occupied and not every data point is equally important. It makes full use of available memory to derive the finest possible sub-clusters while minimizing I/O costs. It is also an incremental method that does not require the whole data set in advance.
This implementation produces a clustering in three steps. First step builds a CF (clustering feature) tree by a single scan of database. The second step clusters the leaves of CF tree by hierarchical clustering. Then the user can use the learned model to cluster input data in the final step. In total, we scan the database twice.
HierarchicalClustering
,
KMeans
OUTLIER
Constructor and Description |
---|
BIRCH(int d,
int B,
double T)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
add(double[] x)
Add a data point into CF tree.
|
double[][] |
centroids()
Returns the representatives of clusters.
|
int |
dimension()
Returns the dimensionality of data.
|
int |
getBrachingFactor()
Returns the branching factor, which is the maximum number of children nodes.
|
double |
getMaxRadius()
Returns the maximum radius of a sub-cluster.
|
int |
partition(int k)
Clustering leaves of CF tree into k clusters.
|
int |
partition(int k,
int minPts)
Clustering leaves of CF tree into k clusters.
|
int |
predict(double[] x)
Cluster a new instance to the nearest CF leaf.
|
public BIRCH(int d, int B, double T)
d
- the dimensionality of data.B
- the branching factor. Maximum number of children nodes.T
- the maximum radius of a sub-cluster.public void add(double[] x)
public int getBrachingFactor()
public double getMaxRadius()
public int dimension()
public int partition(int k)
k
- the number of clusters.public int partition(int k, int minPts)
k
- the number of clusters.minPts
- a CF leaf will be treated as outlier if the number of its
points is less than minPts.public int predict(double[] x)
partition(int)
method first
to clustering leaves. Then they call this method to clustering new
data.predict
in interface Clustering<double[]>
x
- a new instance.Clustering.OUTLIER
.public double[][] centroids()
Copyright © 2015. All rights reserved.