public class UMAP
extends java.lang.Object
implements java.io.Serializable
From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.
TSNE
,
Serialized FormModifier and Type | Field and Description |
---|---|
double[][] |
coordinates
The coordinate matrix in embedding space.
|
smile.graph.AdjacencyList |
graph
The nearest neighbor graph.
|
int[] |
index
The original sample index.
|
Constructor and Description |
---|
UMAP(int[] index,
double[][] coordinates,
smile.graph.AdjacencyList graph)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static UMAP |
of(double[][] data)
Runs the UMAP algorithm.
|
static UMAP |
of(double[][] data,
int k)
Runs the UMAP algorithm.
|
static UMAP |
of(double[][] data,
int k,
int d,
int iterations,
double learningRate,
double minDist,
double spread,
int negativeSamples,
double repulsionStrength)
Runs the UMAP algorithm.
|
static <T> UMAP |
of(T[] data,
smile.math.distance.Distance<T> distance)
Runs the UMAP algorithm.
|
static <T> UMAP |
of(T[] data,
smile.math.distance.Distance<T> distance,
int k)
Runs the UMAP algorithm.
|
static <T> UMAP |
of(T[] data,
smile.math.distance.Distance<T> distance,
int k,
int d,
int iterations,
double learningRate,
double minDist,
double spread,
int negativeSamples,
double repulsionStrength)
Runs the UMAP algorithm.
|
public final double[][] coordinates
public final int[] index
public final smile.graph.AdjacencyList graph
public UMAP(int[] index, double[][] coordinates, smile.graph.AdjacencyList graph)
index
- the original sample index.coordinates
- the coordinates.graph
- the nearest neighbor graph.public static UMAP of(double[][] data)
data
- the input data.public static <T> UMAP of(T[] data, smile.math.distance.Distance<T> distance)
data
- the input data.distance
- the distance measure.public static UMAP of(double[][] data, int k)
data
- the input data.k
- k-nearest neighbors. Larger values result in more global views
of the manifold, while smaller values result in more local data
being preserved. Generally in the range 2 to 100.public static <T> UMAP of(T[] data, smile.math.distance.Distance<T> distance, int k)
data
- the input data.k
- k-nearest neighbor. Larger values result in more global views
of the manifold, while smaller values result in more local data
being preserved. Generally in the range 2 to 100.public static UMAP of(double[][] data, int k, int d, int iterations, double learningRate, double minDist, double spread, int negativeSamples, double repulsionStrength)
data
- the input data.k
- k-nearest neighbors. Larger values result in more global views
of the manifold, while smaller values result in more local data
being preserved. Generally in the range 2 to 100.d
- The target embedding dimensions. defaults to 2 to provide easy
visualization, but can reasonably be set to any integer value
in the range 2 to 100.iterations
- The number of iterations to optimize the
low-dimensional representation. Larger values result in more
accurate embedding. Muse be at least 10. Choose wise value
based on the size of the input data, e.g, 200 for large
data (10000+ samples), 500 for small.learningRate
- The initial learning rate for the embedding optimization,
default 1.minDist
- The desired separation between close points in the embedding
space. Smaller values will result in a more clustered/clumped
embedding where nearby points on the manifold are drawn closer
together, while larger values will result on a more even
disperse of points. The value should be set no-greater than
and relative to the spread value, which determines the scale
at which embedded points will be spread out. default 0.1.spread
- The effective scale of embedded points. In combination with
minDist, this determines how clustered/clumped the embedded
points are. default 1.0.negativeSamples
- The number of negative samples to select per positive sample
in the optimization process. Increasing this value will result
in greater repulsive force being applied, greater optimization
cost, but slightly more accuracy, default 5.repulsionStrength
- Weighting applied to negative samples in low dimensional
embedding optimization. Values higher than one will result in
greater weight being given to negative samples, default 1.0.public static <T> UMAP of(T[] data, smile.math.distance.Distance<T> distance, int k, int d, int iterations, double learningRate, double minDist, double spread, int negativeSamples, double repulsionStrength)
data
- the input data.distance
- the distance measure.k
- k-nearest neighbor. Larger values result in more global views
of the manifold, while smaller values result in more local data
being preserved. Generally in the range 2 to 100.d
- The target embedding dimensions. defaults to 2 to provide easy
visualization, but can reasonably be set to any integer value
in the range 2 to 100.iterations
- The number of iterations to optimize the
low-dimensional representation. Larger values result in more
accurate embedding. Muse be at least 10. Choose wise value
based on the size of the input data, e.g, 200 for large
data (1000+ samples), 500 for small.learningRate
- The initial learning rate for the embedding optimization,
default 1.minDist
- The desired separation between close points in the embedding
space. Smaller values will result in a more clustered/clumped
embedding where nearby points on the manifold are drawn closer
together, while larger values will result on a more even
disperse of points. The value should be set no-greater than
and relative to the spread value, which determines the scale
at which embedded points will be spread out. default 0.1.spread
- The effective scale of embedded points. In combination with
minDist, this determines how clustered/clumped the embedded
points are. default 1.0.negativeSamples
- The number of negative samples to select per positive sample
in the optimization process. Increasing this value will result
in greater repulsive force being applied, greater optimization
cost, but slightly more accuracy, default 5.repulsionStrength
- Weighting applied to negative samples in low dimensional
embedding optimization. Values higher than one will result in
greater weight being given to negative samples, default 1.0.