Class NDNN


  • public class NDNN
    extends Object
    • Constructor Detail

      • NDNN

        public NDNN()
    • Method Detail

      • cReLU

        public INDArray cReLU​(INDArray x)
        Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • batchNorm

        public INDArray batchNorm​(INDArray input,
                                  INDArray mean,
                                  INDArray variance,
                                  INDArray gamma,
                                  INDArray beta,
                                  double epsilon,
                                  int... axis)
        Neural network batch normalization operation.
        For details, see https://arxiv.org/abs/1502.03167
        Parameters:
        input - Input variable. (NUMERIC type)
        mean - Mean value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        variance - Variance value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        gamma - Gamma value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        beta - Beta value. For 1d axis, this should match input.size(axis) (NUMERIC type)
        epsilon - Epsilon constant for numerical stability (to avoid division by 0)
        axis - For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations. For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))
        Returns:
        output variable for batch normalization (NUMERIC type)
      • biasAdd

        public INDArray biasAdd​(INDArray input,
                                INDArray bias,
                                boolean nchw)
        Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector
        Parameters:
        input - 4d input variable (NUMERIC type)
        bias - 1d bias (NUMERIC type)
        nchw - The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels]. Unused for 2d inputs
        Returns:
        output Output variable, after applying bias add operation (NUMERIC type)
      • dotProductAttention

        public INDArray dotProductAttention​(INDArray queries,
                                            INDArray keys,
                                            INDArray values,
                                            INDArray mask,
                                            boolean scaled)
        This operation performs dot product attention on the given timeseries input with the given queries
        out = sum(similarity(k_i, q) * v_i)

        similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q

        Optionally with normalization step:
        similarity(k, q) = softmax(k * q / sqrt(size(q))

        See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)

        Note: This supports multiple queries at once, if only one query is available the queries vector still has to
        be 3D but can have queryCount = 1

        Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for
        both.

        Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The
        output rank will depend on the input rank.
        Parameters:
        queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount] or 4D array of shape [batchSize, numHeads, featureKeys, queryCount] (NUMERIC type)
        keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps] or 4D array of shape [batchSize, numHeads, featureKeys, timesteps] (NUMERIC type)
        values - input 3D array "values" of shape [batchSize, featureValues, timesteps] or 4D array of shape [batchSize, numHeads, featureValues, timesteps] (NUMERIC type)
        mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)
        scaled - normalization, false -> do not apply normalization, true -> apply normalization
        Returns:
        output Attention result arrays of shape [batchSize, featureValues, queryCount] or [batchSize, numHeads, featureValues, queryCount], (optionally) Attention Weights of shape [batchSize, timesteps, queryCount] or [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
      • dropout

        public INDArray dropout​(INDArray input,
                                double inputRetainProbability)
        Dropout operation
        Parameters:
        input - Input array (NUMERIC type)
        inputRetainProbability - Probability of retaining an input (set to 0 with probability 1-p)
        Returns:
        output Output (NUMERIC type)
      • dropoutInverted

        public INDArray dropoutInverted​(INDArray input,
                                        double p)
        Dropout inverted operation. The dropout probability p is the probability of dropping an input.
        Parameters:
        input - Input array (NUMERIC type)
        p - Probability of dropping an input (set to 0 with probability p)
        Returns:
        output Output (NUMERIC type)
      • elu

        public INDArray elu​(INDArray x)
        Element-wise exponential linear unit (ELU) function:
        out = x if x > 0
        out = a * (exp(x) - 1) if x <= 0
        with constant a = 1.0


        See: https://arxiv.org/abs/1511.07289

        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • gelu

        public INDArray gelu​(INDArray x)
        GELU activation function - Gaussian Error Linear Units
        For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
        This method uses the sigmoid approximation
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • hardSigmoid

        public INDArray hardSigmoid​(INDArray x)
        Element-wise hard sigmoid function:
        out[i] = 0 if in[i] <= -2.5
        out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
        out[i] = 1 if in[i] >= 2.5
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • hardTanh

        public INDArray hardTanh​(INDArray x)
        Element-wise hard tanh function:
        out[i] = -1 if in[i] <= -1
        out[1] = in[i] if -1 < in[i] < 1
        out[i] = 1 if in[i] >= 1
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • hardTanhDerivative

        public INDArray hardTanhDerivative​(INDArray x)
        Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • layerNorm

        public INDArray layerNorm​(INDArray input,
                                  INDArray gain,
                                  INDArray bias,
                                  boolean channelsFirst,
                                  int... dimensions)
        Apply Layer Normalization

        y = gain * standardize(x) + bias
        Parameters:
        input - Input variable (NUMERIC type)
        gain - Gain (NUMERIC type)
        bias - Bias (NUMERIC type)
        channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data
        dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))
        Returns:
        output Output variable (NUMERIC type)
      • layerNorm

        public INDArray layerNorm​(INDArray input,
                                  INDArray gain,
                                  boolean channelsFirst,
                                  int... dimensions)
        Apply Layer Normalization

        y = gain * standardize(x) + bias
        Parameters:
        input - Input variable (NUMERIC type)
        gain - Gain (NUMERIC type)
        channelsFirst - For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC data
        dimensions - Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))
        Returns:
        output Output variable (NUMERIC type)
      • leakyRelu

        public INDArray leakyRelu​(INDArray x,
                                  double alpha)
        Element-wise leaky ReLU function:
        out = x if x >= 0.0
        out = alpha * x if x < cutoff
        Alpha value is most commonly set to 0.01
        Parameters:
        x - Input variable (NUMERIC type)
        alpha - Cutoff - commonly 0.01
        Returns:
        output Output variable (NUMERIC type)
      • leakyReluDerivative

        public INDArray leakyReluDerivative​(INDArray x,
                                            double alpha)
        Leaky ReLU derivative: dOut/dIn given input.
        Parameters:
        x - Input variable (NUMERIC type)
        alpha - Cutoff - commonly 0.01
        Returns:
        output Output variable (NUMERIC type)
      • linear

        public INDArray linear​(INDArray input,
                               INDArray weights,
                               INDArray bias)
        Linear layer operation: out = mmul(in,w) + bias
        Note that bias array is optional
        Parameters:
        input - Input data (NUMERIC type)
        weights - Weights variable, shape [nIn, nOut] (NUMERIC type)
        bias - Optional bias variable (may be null) (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • logSigmoid

        public INDArray logSigmoid​(INDArray x)
        Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • logSoftmax

        public INDArray logSoftmax​(INDArray x)
        Log softmax activation
        Parameters:
        x - (NUMERIC type)
        Returns:
        output (NUMERIC type)
      • logSoftmax

        public INDArray logSoftmax​(INDArray x,
                                   int dimension)
        Log softmax activation
        Parameters:
        x - Input (NUMERIC type)
        dimension - Dimension along which to apply log softmax
        Returns:
        output Output - log(softmax(input)) (NUMERIC type)
      • multiHeadDotProductAttention

        public INDArray multiHeadDotProductAttention​(INDArray queries,
                                                     INDArray keys,
                                                     INDArray values,
                                                     INDArray Wq,
                                                     INDArray Wk,
                                                     INDArray Wv,
                                                     INDArray Wo,
                                                     INDArray mask,
                                                     boolean scaled)
        This performs multi-headed dot product attention on the given timeseries input
        out = concat(head_1, head_2, ..., head_n) * Wo
        head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)

        Optionally with normalization when calculating the attention for each head.

        See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention")

        This makes use of dot_product_attention OP support for rank 4 inputs.
        see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)
        Parameters:
        queries - input 3D array "queries" of shape [batchSize, featureKeys, queryCount] (NUMERIC type)
        keys - input 3D array "keys" of shape [batchSize, featureKeys, timesteps] (NUMERIC type)
        values - input 3D array "values" of shape [batchSize, featureValues, timesteps] (NUMERIC type)
        Wq - input query projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)
        Wk - input key projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)
        Wv - input value projection weights of shape [numHeads, projectedValues, featureValues] (NUMERIC type)
        Wo - output projection weights of shape [numHeads * projectedValues, outSize] (NUMERIC type)
        mask - OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)
        scaled - normalization, false -> do not apply normalization, true -> apply normalization
        Returns:
        output Attention result arrays of shape [batchSize, outSize, queryCount] (optionally) Attention Weights of shape [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
      • pad

        public INDArray pad​(INDArray input,
                            INDArray padding,
                            PadMode PadMode,
                            double constant)
        Padding operation
        Parameters:
        input - Input tensor (NUMERIC type)
        padding - Padding value (NUMERIC type)
        PadMode - Padding format
        constant - Padding constant
        Returns:
        output Padded input (NUMERIC type)
      • pad

        public INDArray pad​(INDArray input,
                            INDArray padding,
                            double constant)
        Padding operation
        Parameters:
        input - Input tensor (NUMERIC type)
        padding - Padding value (NUMERIC type)
        constant - Padding constant
        Returns:
        output Padded input (NUMERIC type)
      • preciseGelu

        public INDArray preciseGelu​(INDArray x)
        GELU activation function - Gaussian Error Linear Units
        For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
        This method uses the precise method
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • prelu

        public INDArray prelu​(INDArray input,
                              INDArray alpha,
                              int... sharedAxes)
        PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:
        out[i] = in[i] if in[i] >= 0
        out[i] = in[i] * alpha[i] otherwise

        sharedAxes allows you to share learnable parameters along axes.
        For example, if the input has shape [batchSize, channels, height, width]
        and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an
        alpha with shape [channels].
        Parameters:
        input - Input data (NUMERIC type)
        alpha - The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha. (NUMERIC type)
        sharedAxes - Which axes to share cutoff parameters along. (Size: AtLeast(min=1))
        Returns:
        output Output (NUMERIC type)
      • relu

        public INDArray relu​(INDArray x,
                             double cutoff)
        Element-wise rectified linear function with specified cutoff:
        out[i] = in[i] if in[i] >= cutoff
        out[i] = 0 otherwise
        Parameters:
        x - Input (NUMERIC type)
        cutoff - Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0
        Returns:
        output Output (NUMERIC type)
      • relu6

        public INDArray relu6​(INDArray x,
                              double cutoff)
        Element-wise "rectified linear 6" function with specified cutoff:
        out[i] = min(max(in, cutoff), 6)
        Parameters:
        x - Input (NUMERIC type)
        cutoff - Cutoff value for ReLU operation. Usually 0
        Returns:
        output Output (NUMERIC type)
      • reluLayer

        public INDArray reluLayer​(INDArray input,
                                  INDArray weights,
                                  INDArray bias)
        ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
        Note that bias array is optional
        Parameters:
        input - Input data (NUMERIC type)
        weights - Weights variable (NUMERIC type)
        bias - Optional bias variable (may be null) (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • selu

        public INDArray selu​(INDArray x)
        Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks

        out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
        Uses default scale and alpha values.
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • sigmoid

        public INDArray sigmoid​(INDArray x)
        Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • sigmoidDerivative

        public INDArray sigmoidDerivative​(INDArray x,
                                          INDArray wrt)
        Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut
        Parameters:
        x - Input Variable (NUMERIC type)
        wrt - Gradient at the output - dL/dOut. Must have same shape as the input (NUMERIC type)
        Returns:
        output Output (gradient at input of sigmoid) (NUMERIC type)
      • softmax

        public INDArray softmax​(INDArray x,
                                int dimension)
        Softmax activation, along the specified dimension
        Parameters:
        x - Input (NUMERIC type)
        dimension - Dimension along which to apply softmax
        Returns:
        output Output variable (NUMERIC type)
      • softmax

        public INDArray softmax​(INDArray x)
        Softmax activation, along the specified dimension
        Parameters:
        x - Input (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • softmaxDerivative

        public INDArray softmaxDerivative​(INDArray x,
                                          INDArray wrt,
                                          int dimension)
        Softmax derivative function
        Parameters:
        x - Softmax input (NUMERIC type)
        wrt - Gradient at output, dL/dx (NUMERIC type)
        dimension - Softmax dimension
        Returns:
        output (NUMERIC type)
      • softplus

        public INDArray softplus​(INDArray x)
        Element-wise softplus function: out = log(exp(x) + 1)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • softsign

        public INDArray softsign​(INDArray x)
        Element-wise softsign function: out = x / (abs(x) + 1)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • softsignDerivative

        public INDArray softsignDerivative​(INDArray x)
        Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output (NUMERIC type)
      • swish

        public INDArray swish​(INDArray x)
        Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
        See: https://arxiv.org/abs/1710.05941
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • tanh

        public INDArray tanh​(INDArray x)
        Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)
        Parameters:
        x - Input variable (NUMERIC type)
        Returns:
        output Output variable (NUMERIC type)
      • topK

        public INDArray[] topK​(INDArray input,
                               double k,
                               boolean sorted)
        Find values and indices for the largest k entries along the last dimension.
        Parameters:
        input - Input data (NUMERIC type)
        k - The number of values to return
        sorted - Whether to return the values sorted or not