T
- the type of input object.public class DBSCAN<T> extends PartitionClustering
DBSCAN requires two parameters: radius (i.e. neighborhood radius) and the number of minimum points required to form a cluster (minPts). It starts with an arbitrary starting point that has not been visited. This point's neighborhood is retrieved, and if it contains sufficient number of points, a cluster is started. Otherwise, the point is labeled as noise. Note that this point might later be found in a sufficiently sized radius-environment of a different point and hence be made part of a cluster.
If a point is found to be part of a cluster, its neighborhood is also part of that cluster. Hence, all points that are found within the neighborhood are added, as is their own neighborhood. This process continues until the cluster is completely found. Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster of noise.
DBSCAN visits each point of the database, possibly multiple times (e.g., as candidates to different clusters). For practical considerations, however, the time complexity is mostly governed by the number of nearest neighbor queries. DBSCAN executes exactly one such query for each point, and if an indexing structure is used that executes such a neighborhood query in O(log n), an overall runtime complexity of O(n log n) is obtained.
DBSCAN has many advantages such as
Modifier and Type | Field and Description |
---|---|
double |
minPts
The minimum number of points required to form a cluster
|
double |
radius
The neighborhood radius.
|
k, OUTLIER, size, y
Constructor and Description |
---|
DBSCAN(int minPts,
double radius,
RNNSearch<T,T> nns,
int k,
int[] y)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static DBSCAN<double[]> |
fit(double[][] data,
int minPts,
double radius)
Clustering the data with KD-tree.
|
static <T> DBSCAN<T> |
fit(T[] data,
smile.math.distance.Distance<T> distance,
int minPts,
double radius)
Clustering the data.
|
static <T> DBSCAN<T> |
fit(T[] data,
RNNSearch<T,T> nns,
int minPts,
double radius)
Clustering the data.
|
int |
predict(T x)
Classifies a new observation.
|
run, seed, toString
public final double minPts
public final double radius
public DBSCAN(int minPts, double radius, RNNSearch<T,T> nns, int k, int[] y)
minPts
- the minimum number of neighbors for a core data point.radius
- the neighborhood radius.nns
- the data structure for neighborhood search.k
- the number of clusters.y
- the cluster labels.public static DBSCAN<double[]> fit(double[][] data, int minPts, double radius)
data
- the observations.minPts
- the minimum number of neighbors for a core data point.radius
- the neighborhood radius.public static <T> DBSCAN<T> fit(T[] data, smile.math.distance.Distance<T> distance, int minPts, double radius)
data
- the observations.distance
- the distance measure for neighborhood search.minPts
- the minimum number of neighbors for a core data point.radius
- the neighborhood radius.public static <T> DBSCAN<T> fit(T[] data, RNNSearch<T,T> nns, int minPts, double radius)
data
- the observations.nns
- the data structure for neighborhood search.minPts
- the minimum number of neighbors for a core data point.radius
- the neighborhood radius.public int predict(T x)
x
- a new observation.PartitionClustering.OUTLIER
.