T
- the type of input object.public class DBScan<T> extends PartitionClustering<T>
DBScan requires two parameters: radius (i.e. neighborhood radius) and the number of minimum points required to form a cluster (minPts). It starts with an arbitrary starting point that has not been visited. This point's neighborhood is retrieved, and if it contains sufficient number of points, a cluster is started. Otherwise, the point is labeled as noise. Note that this point might later be found in a sufficiently sized radius-environment of a different point and hence be made part of a cluster.
If a point is found to be part of a cluster, its neighborhood is also part of that cluster. Hence, all points that are found within the neighborhood are added, as is their own neighborhood. This process continues until the cluster is completely found. Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster of noise.
DBScan visits each point of the database, possibly multiple times (e.g., as candidates to different clusters). For practical considerations, however, the time complexity is mostly governed by the number of nearest neighbor queries. DBScan executes exactly one such query for each point, and if an indexing structure is used that executes such a neighborhood query in O(log n), an overall runtime complexity of O(n log n) is obtained.
DBScan has many advantages such as
OUTLIER
Constructor and Description |
---|
DBScan(T[] data,
Distance<T> distance,
int minPts,
double radius)
Constructor.
|
DBScan(T[] data,
Metric<T> distance,
int minPts,
double radius)
Constructor.
|
DBScan(T[] data,
RNNSearch<T,T> nns,
int minPts,
double radius)
Clustering the data.
|
Modifier and Type | Method and Description |
---|---|
double |
getMinPts()
Returns the parameter of minimum number of neighbors.
|
double |
getRadius()
Returns the radius of neighborhood.
|
int |
predict(T x)
Cluster a new instance.
|
String |
toString() |
getClusterLabel, getClusterSize, getNumClusters
public DBScan(T[] data, Distance<T> distance, int minPts, double radius)
data
- the dataset for clustering.distance
- the distance measure for neighborhood search.minPts
- the minimum number of neighbors for a core data point.radius
- the neighborhood radius.public DBScan(T[] data, Metric<T> distance, int minPts, double radius)
data
- the dataset for clustering.distance
- the distance measure for neighborhood search.minPts
- the minimum number of neighbors for a core data point.radius
- the neighborhood radius.public double getMinPts()
public double getRadius()
public int predict(T x)
x
- a new instance.Clustering.OUTLIER
.Copyright © 2015. All rights reserved.