public class MEC<T> extends PartitionClustering implements java.lang.Comparable<MEC<T>>
The clustering criterion is based on the conditional entropy H(C | x), where C is the cluster label and x is an observation. According to Fano's inequality, we can estimate C with a low probability of error only if the conditional entropy H(C | X) is small. MEC also generalizes the criterion by replacing Shannon's entropy with Havrda-Charvat's structural α-entropy. Interestingly, the minimum entropy criterion based on structural α-entropy is equal to the probability error of the nearest neighbor method when α= 2. To estimate p(C | x), MEC employs Parzen density estimation, a nonparametric approach.
MEC is an iterative algorithm starting with an initial partition given by any other clustering methods, e.g. k-means, CLARNAS, hierarchical clustering, etc. Note that a random initialization is NOT appropriate.
Modifier and Type | Field and Description |
---|---|
double |
entropy
The conditional entropy as the objective function.
|
double |
radius
The range of neighborhood.
|
k, OUTLIER, size, y
Constructor and Description |
---|
MEC(double entropy,
double radius,
RNNSearch<T,T> nns,
int k,
int[] y)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
int |
compareTo(MEC<T> o) |
static <T> MEC<T> |
fit(T[] data,
smile.math.distance.Distance<T> distance,
int k,
double radius)
Clustering the data.
|
static <T> MEC<T> |
fit(T[] data,
RNNSearch<T,T> nns,
int k,
double radius,
int[] y,
double tol)
Clustering the data.
|
int |
predict(T x)
Cluster a new instance.
|
java.lang.String |
toString() |
run, seed
public final double entropy
public final double radius
public static <T> MEC<T> fit(T[] data, smile.math.distance.Distance<T> distance, int k, double radius)
data
- the observations.distance
- the distance measure for neighborhood search.k
- the number of clusters. Note that this is just a hint. The final
number of clusters may be less.radius
- the neighborhood radius.public static <T> MEC<T> fit(T[] data, RNNSearch<T,T> nns, int k, double radius, int[] y, double tol)
data
- the observations.nns
- the neighborhood search data structure.k
- the number of clusters. Note that this is just a hint. The final
number of clusters may be less.radius
- the neighborhood radius.y
- the initial clustering labels, which could be produced by any
other clustering methods.tol
- the tolerance of convergence test.public int predict(T x)
x
- a new instance.PartitionClustering.OUTLIER
.public java.lang.String toString()
toString
in class PartitionClustering