Umap

Dimension reduction similar to UMAP For reference see https://arxiv.org/abs/1802.03426 This method does not follow the above paper exactly.

Minimizes the objective function: L(x) = L_attraction(x) + L_repulsion(x)

L_attraction(x) = sum over (i,j) edges : b_ij * ln(f(x_i,x_j)) b_ij is the value of the 'UMAP graph' as in the above paper x_i is the low dimensional coordinate of the i-th sample f(x,y) = 1 if ||x-y||_2 < minDist , or exp(-(||x-y||_2 - minDist)) otherwise

L_repulsion(x) = sum over (i,j) edges: (1-b_ij) * ln(1 - f(x_i,x_j)) , evaluated with sampling L_repulsion is evaluated by randomly sampling in each iteration from all (i,j) edges having b_ij=0

Nearest neighbor search is evaluated by brute force. It may be batched, and may be evaluated on the GPU.

L(x) is maximized by gradient descent, in particular Adam. Derivatives of L(x) are computed using reverse mode automatic differentiation (autograd). Gradient descent may be evaluated on the GPU.

Distance metric is alway Euclidean.

Differences to the algorithm described in the UMAP paper:

The paper desribes a smooth approximation of the function 'f' (Definition 11.). That approximation is not used in this code.
The paper describes an optimization procedure different from the approach taken here. They sample each edge according to b_ij and update the vertices one after the other. The current code updates each locations all together according to the derivative of L(x).

Value parameters

balanceAttractionsAndRepulsions: if true the number of negative samples will not affect the relative strength of attractions and repulsions (see @param repulsionStrength)
data: each row is a sample
device: device to run the optimization and KNN search (GPU or CPU)
iterations: number of epochs of optimization
k: number of nearest neighbors to retrieve. Self is counted as nearest neighbor
knnMinibatchSize: KNN search may be batched if the device can't fit the whole distance matrix
lr: learning rate
minDist: see above equations for the definition, see the UMAP paper for its effect
negativeSampleSize: number of negative edges to select for each positive
numDim: number of dimensions to project to
precision: precision to run the KNN search, optimization is always in double precision
repulsionStrength: strength of repulsions compared to attractions

Attributes

Returns: a triple of the layout, the umap graph (b) and the final optimization loss

Umap

Attributes

Members list

Value members

Concrete methods

Value parameters

Attributes