Package org.nd4j.linalg.factory.ops
Class NDNN
- java.lang.Object
-
- org.nd4j.linalg.factory.ops.NDNN
-
public class NDNN extends Object
-
-
Constructor Summary
Constructors Constructor Description NDNN()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description INDArray
batchNorm(INDArray input, INDArray mean, INDArray variance, INDArray gamma, INDArray beta, double epsilon, int... axis)
INDArray
biasAdd(INDArray input, INDArray bias, boolean nchw)
Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vectorINDArray
cReLU(INDArray x)
Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation.INDArray
dotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray mask, boolean scaled)
This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i)
similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q
Optionally with normalization step:
similarity(k, q) = softmax(k * q / sqrt(size(q))
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p.INDArray
dropout(INDArray input, double inputRetainProbability)
Dropout operationINDArray
dropoutInverted(INDArray input, double p)
Dropout inverted operation.INDArray
elu(INDArray x)
Element-wise exponential linear unit (ELU) function:
out = x if x > 0
out = a * (exp(x) - 1) if x <= 0
with constant a = 1.0INDArray
gelu(INDArray x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the sigmoid approximationINDArray
hardSigmoid(INDArray x)
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5
out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
out[i] = 1 if in[i] >= 2.5INDArray
hardTanh(INDArray x)
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1
out[1] = in[i] if -1 < in[i] < 1
out[i] = 1 if in[i] >= 1INDArray
hardTanhDerivative(INDArray x)
Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)INDArray
layerNorm(INDArray input, INDArray gain, boolean channelsFirst, int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + biasINDArray
layerNorm(INDArray input, INDArray gain, INDArray bias, boolean channelsFirst, int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + biasINDArray
leakyRelu(INDArray x, double alpha)
Element-wise leaky ReLU function:
out = x if x >= 0.0
out = alpha * x if x < cutoff
Alpha value is most commonly set to 0.01INDArray
leakyReluDerivative(INDArray x, double alpha)
Leaky ReLU derivative: dOut/dIn given input.INDArray
linear(INDArray input, INDArray weights, INDArray bias)
Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optionalINDArray
logSigmoid(INDArray x)
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))INDArray
logSoftmax(INDArray x)
Log softmax activationINDArray
logSoftmax(INDArray x, int dimension)
Log softmax activationINDArray
multiHeadDotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray Wq, INDArray Wk, INDArray Wv, INDArray Wo, INDArray mask, boolean scaled)
This performs multi-headed dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo
head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)
Optionally with normalization when calculating the attention for each head.
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp.INDArray
pad(INDArray input, INDArray padding, double constant)
Padding operationINDArray
pad(INDArray input, INDArray padding, PadMode PadMode, double constant)
Padding operationINDArray
preciseGelu(INDArray x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the precise methodINDArray
prelu(INDArray input, INDArray alpha, int... sharedAxes)
PReLU (Parameterized Rectified Linear Unit) operation.INDArray
relu(INDArray x, double cutoff)
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff
out[i] = 0 otherwiseINDArray
relu6(INDArray x, double cutoff)
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6)INDArray
reluLayer(INDArray input, INDArray weights, INDArray bias)
ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optionalINDArray
selu(INDArray x)
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
Uses default scale and alpha values.INDArray
sigmoid(INDArray x)
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))INDArray
sigmoidDerivative(INDArray x, INDArray wrt)
Element-wise sigmoid function derivative: dL/dIn given input and dL/dOutINDArray
softmax(INDArray x)
Softmax activation, along the specified dimensionINDArray
softmax(INDArray x, int dimension)
Softmax activation, along the specified dimensionINDArray
softmaxDerivative(INDArray x, INDArray wrt, int dimension)
Softmax derivative functionINDArray
softplus(INDArray x)
Element-wise softplus function: out = log(exp(x) + 1)INDArray
softsign(INDArray x)
Element-wise softsign function: out = x / (abs(x) + 1)INDArray
softsignDerivative(INDArray x)
Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)INDArray
swish(INDArray x)
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941INDArray
tanh(INDArray x)
Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)INDArray[]
topK(INDArray input, double k, boolean sorted)
Find values and indices for the largest k entries along the last dimension.
-
-
-
Method Detail
-
cReLU
public INDArray cReLU(INDArray x)
Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
batchNorm
public INDArray batchNorm(INDArray input, INDArray mean, INDArray variance, INDArray gamma, INDArray beta, double epsilon, int... axis)
- Parameters:
input
- Input variable. (NUMERIC type)mean
- Mean value. For 1d axis, this should match input.size(axis) (NUMERIC type)variance
- Variance value. For 1d axis, this should match input.size(axis) (NUMERIC type)gamma
- Gamma value. For 1d axis, this should match input.size(axis) (NUMERIC type)beta
- Beta value. For 1d axis, this should match input.size(axis) (NUMERIC type)epsilon
- Epsilon constant for numerical stability (to avoid division by 0)axis
- For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations. For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))- Returns:
- output variable for batch normalization (NUMERIC type)
-
biasAdd
public INDArray biasAdd(INDArray input, INDArray bias, boolean nchw)
Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector- Parameters:
input
- 4d input variable (NUMERIC type)bias
- 1d bias (NUMERIC type)nchw
- The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels]. Unused for 2d inputs- Returns:
- output Output variable, after applying bias add operation (NUMERIC type)
-
dotProductAttention
public INDArray dotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray mask, boolean scaled)
This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i)
similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q
Optionally with normalization step:
similarity(k, q) = softmax(k * q / sqrt(size(q))
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)
Note: This supports multiple queries at once, if only one query is available the queries vector still has to
be 3D but can have queryCount = 1
Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for
both.
Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The
output rank will depend on the input rank.- Parameters:
queries
- input 3D array "queries" of shape [batchSize, featureKeys, queryCount] or 4D array of shape [batchSize, numHeads, featureKeys, queryCount] (NUMERIC type)keys
- input 3D array "keys" of shape [batchSize, featureKeys, timesteps] or 4D array of shape [batchSize, numHeads, featureKeys, timesteps] (NUMERIC type)values
- input 3D array "values" of shape [batchSize, featureValues, timesteps] or 4D array of shape [batchSize, numHeads, featureValues, timesteps] (NUMERIC type)mask
- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled
- normalization, false -> do not apply normalization, true -> apply normalization- Returns:
- output Attention result arrays of shape [batchSize, featureValues, queryCount] or [batchSize, numHeads, featureValues, queryCount], (optionally) Attention Weights of shape [batchSize, timesteps, queryCount] or [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
-
dropout
public INDArray dropout(INDArray input, double inputRetainProbability)
Dropout operation- Parameters:
input
- Input array (NUMERIC type)inputRetainProbability
- Probability of retaining an input (set to 0 with probability 1-p)- Returns:
- output Output (NUMERIC type)
-
dropoutInverted
public INDArray dropoutInverted(INDArray input, double p)
Dropout inverted operation. The dropout probability p is the probability of dropping an input.- Parameters:
input
- Input array (NUMERIC type)p
- Probability of dropping an input (set to 0 with probability p)- Returns:
- output Output (NUMERIC type)
-
elu
public INDArray elu(INDArray x)
Element-wise exponential linear unit (ELU) function:
out = x if x > 0
out = a * (exp(x) - 1) if x <= 0
with constant a = 1.0
- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
gelu
public INDArray gelu(INDArray x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the sigmoid approximation- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
hardSigmoid
public INDArray hardSigmoid(INDArray x)
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5
out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
out[i] = 1 if in[i] >= 2.5- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
hardTanh
public INDArray hardTanh(INDArray x)
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1
out[1] = in[i] if -1 < in[i] < 1
out[i] = 1 if in[i] >= 1- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
hardTanhDerivative
public INDArray hardTanhDerivative(INDArray x)
Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
layerNorm
public INDArray layerNorm(INDArray input, INDArray gain, INDArray bias, boolean channelsFirst, int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias- Parameters:
input
- Input variable (NUMERIC type)gain
- Gain (NUMERIC type)bias
- Bias (NUMERIC type)channelsFirst
- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions
- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))- Returns:
- output Output variable (NUMERIC type)
-
layerNorm
public INDArray layerNorm(INDArray input, INDArray gain, boolean channelsFirst, int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias- Parameters:
input
- Input variable (NUMERIC type)gain
- Gain (NUMERIC type)channelsFirst
- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions
- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))- Returns:
- output Output variable (NUMERIC type)
-
leakyRelu
public INDArray leakyRelu(INDArray x, double alpha)
Element-wise leaky ReLU function:
out = x if x >= 0.0
out = alpha * x if x < cutoff
Alpha value is most commonly set to 0.01- Parameters:
x
- Input variable (NUMERIC type)alpha
- Cutoff - commonly 0.01- Returns:
- output Output variable (NUMERIC type)
-
leakyReluDerivative
public INDArray leakyReluDerivative(INDArray x, double alpha)
Leaky ReLU derivative: dOut/dIn given input.- Parameters:
x
- Input variable (NUMERIC type)alpha
- Cutoff - commonly 0.01- Returns:
- output Output variable (NUMERIC type)
-
linear
public INDArray linear(INDArray input, INDArray weights, INDArray bias)
Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optional- Parameters:
input
- Input data (NUMERIC type)weights
- Weights variable, shape [nIn, nOut] (NUMERIC type)bias
- Optional bias variable (may be null) (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
logSigmoid
public INDArray logSigmoid(INDArray x)
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
logSoftmax
public INDArray logSoftmax(INDArray x)
Log softmax activation- Parameters:
x
- (NUMERIC type)- Returns:
- output (NUMERIC type)
-
logSoftmax
public INDArray logSoftmax(INDArray x, int dimension)
Log softmax activation- Parameters:
x
- Input (NUMERIC type)dimension
- Dimension along which to apply log softmax- Returns:
- output Output - log(softmax(input)) (NUMERIC type)
-
multiHeadDotProductAttention
public INDArray multiHeadDotProductAttention(INDArray queries, INDArray keys, INDArray values, INDArray Wq, INDArray Wk, INDArray Wv, INDArray Wo, INDArray mask, boolean scaled)
This performs multi-headed dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo
head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)
Optionally with normalization when calculating the attention for each head.
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention")
This makes use of dot_product_attention OP support for rank 4 inputs.
see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)- Parameters:
queries
- input 3D array "queries" of shape [batchSize, featureKeys, queryCount] (NUMERIC type)keys
- input 3D array "keys" of shape [batchSize, featureKeys, timesteps] (NUMERIC type)values
- input 3D array "values" of shape [batchSize, featureValues, timesteps] (NUMERIC type)Wq
- input query projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wk
- input key projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wv
- input value projection weights of shape [numHeads, projectedValues, featureValues] (NUMERIC type)Wo
- output projection weights of shape [numHeads * projectedValues, outSize] (NUMERIC type)mask
- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled
- normalization, false -> do not apply normalization, true -> apply normalization- Returns:
- output Attention result arrays of shape [batchSize, outSize, queryCount] (optionally) Attention Weights of shape [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
-
pad
public INDArray pad(INDArray input, INDArray padding, PadMode PadMode, double constant)
Padding operation- Parameters:
input
- Input tensor (NUMERIC type)padding
- Padding value (NUMERIC type)PadMode
- Padding formatconstant
- Padding constant- Returns:
- output Padded input (NUMERIC type)
-
pad
public INDArray pad(INDArray input, INDArray padding, double constant)
Padding operation- Parameters:
input
- Input tensor (NUMERIC type)padding
- Padding value (NUMERIC type)constant
- Padding constant- Returns:
- output Padded input (NUMERIC type)
-
preciseGelu
public INDArray preciseGelu(INDArray x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the precise method- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
prelu
public INDArray prelu(INDArray input, INDArray alpha, int... sharedAxes)
PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:
out[i] = in[i] if in[i] >= 0
out[i] = in[i] * alpha[i] otherwise
sharedAxes allows you to share learnable parameters along axes.
For example, if the input has shape [batchSize, channels, height, width]
and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an
alpha with shape [channels].- Parameters:
input
- Input data (NUMERIC type)alpha
- The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha. (NUMERIC type)sharedAxes
- Which axes to share cutoff parameters along. (Size: AtLeast(min=1))- Returns:
- output Output (NUMERIC type)
-
relu
public INDArray relu(INDArray x, double cutoff)
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff
out[i] = 0 otherwise- Parameters:
x
- Input (NUMERIC type)cutoff
- Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0- Returns:
- output Output (NUMERIC type)
-
relu6
public INDArray relu6(INDArray x, double cutoff)
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6)- Parameters:
x
- Input (NUMERIC type)cutoff
- Cutoff value for ReLU operation. Usually 0- Returns:
- output Output (NUMERIC type)
-
reluLayer
public INDArray reluLayer(INDArray input, INDArray weights, INDArray bias)
ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optional- Parameters:
input
- Input data (NUMERIC type)weights
- Weights variable (NUMERIC type)bias
- Optional bias variable (may be null) (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
selu
public INDArray selu(INDArray x)
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
Uses default scale and alpha values.- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
sigmoid
public INDArray sigmoid(INDArray x)
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
sigmoidDerivative
public INDArray sigmoidDerivative(INDArray x, INDArray wrt)
Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut- Parameters:
x
- Input Variable (NUMERIC type)wrt
- Gradient at the output - dL/dOut. Must have same shape as the input (NUMERIC type)- Returns:
- output Output (gradient at input of sigmoid) (NUMERIC type)
-
softmax
public INDArray softmax(INDArray x, int dimension)
Softmax activation, along the specified dimension- Parameters:
x
- Input (NUMERIC type)dimension
- Dimension along which to apply softmax- Returns:
- output Output variable (NUMERIC type)
-
softmax
public INDArray softmax(INDArray x)
Softmax activation, along the specified dimension- Parameters:
x
- Input (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
softmaxDerivative
public INDArray softmaxDerivative(INDArray x, INDArray wrt, int dimension)
Softmax derivative function- Parameters:
x
- Softmax input (NUMERIC type)wrt
- Gradient at output, dL/dx (NUMERIC type)dimension
- Softmax dimension- Returns:
- output (NUMERIC type)
-
softplus
public INDArray softplus(INDArray x)
Element-wise softplus function: out = log(exp(x) + 1)- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
softsign
public INDArray softsign(INDArray x)
Element-wise softsign function: out = x / (abs(x) + 1)- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
softsignDerivative
public INDArray softsignDerivative(INDArray x)
Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output (NUMERIC type)
-
swish
public INDArray swish(INDArray x)
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
tanh
public INDArray tanh(INDArray x)
Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)- Parameters:
x
- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
-