public class DENCLUE extends PartitionClustering
Clearly, DENCLUE doesn't work on data with uniform distribution. In high dimensional space, the data always look like uniformly distributed because of the curse of dimensionality. Therefore, DENCLUDE doesn't work well on high-dimensional data in general.
Modifier and Type | Field and Description |
---|---|
double[][] |
attractors
The density attractor of each observation.
|
k, OUTLIER, size, y
Constructor and Description |
---|
DENCLUE(int k,
double[][] attractors,
double[] radius,
double[][] samples,
double sigma,
int[] y,
double tol)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static DENCLUE |
fit(double[][] data,
double sigma,
int m)
Clustering data.
|
static DENCLUE |
fit(double[][] data,
double sigma,
int m,
double tol,
int minPts)
Clustering data.
|
int |
predict(double[] x)
Classifies a new observation.
|
run, seed, toString
public final double[][] attractors
public DENCLUE(int k, double[][] attractors, double[] radius, double[][] samples, double sigma, int[] y, double tol)
k
- the number of clusters.attractors
- the density attractor of each observation.radius
- the radius of density attractor.sigma
- the smooth parameter in the Gaussian kernel. The user can
choose sigma such that number of density attractors is
constant for a long interval of sigma.y
- the cluster labels.tol
- the tolerance of hill-climbing procedure.public static DENCLUE fit(double[][] data, double sigma, int m)
data
- the input data of which each row is an observation.sigma
- the smooth parameter in the Gaussian kernel. The user can
choose sigma such that number of density attractors is
constant for a long interval of sigma.m
- the number of selected samples used in the iteration.
This number should be much smaller than the number of
observations to speed up the algorithm. It should also be
large enough to capture the sufficient information of
underlying distribution.public static DENCLUE fit(double[][] data, double sigma, int m, double tol, int minPts)
data
- the input data of which each row is an observation.sigma
- the smooth parameter in the Gaussian kernel. The user can
choose sigma such that number of density attractors is
constant for a long interval of sigma.m
- the number of selected samples used in the iteration.
This number should be much smaller than the number of
observations to speed up the algorithm. It should also be
large enough to capture the sufficient information of
underlying distribution.tol
- the tolerance of hill-climbing procedure.minPts
- the minimum number of neighbors for a core attractor.public int predict(double[] x)
x
- a new observation.PartitionClustering.OUTLIER
.