Interface DataSet

    • Method Detail

      • getRange

        DataSet getRange​(int from,
                         int to)
      • load

        void load​(InputStream from)
        Load the contents of the DataSet from the specified InputStream. The current contents of the DataSet (if any) will be replaced.
        The InputStream should contain a DataSet that has been serialized with save(OutputStream)
        Parameters:
        from - InputStream to load the DataSet from
      • load

        void load​(File from)
        Load the contents of the DataSet from the specified File. The current contents of the DataSet (if any) will be replaced.
        The InputStream should contain a DataSet that has been serialized with save(File)
        Parameters:
        from - File to load the DataSet from
      • save

        void save​(OutputStream to)
        Write the contents of this DataSet to the specified OutputStream
        Parameters:
        to - OutputStream to save the DataSet to
      • save

        void save​(File to)
        Save this DataSet to a file. Can be loaded again using
        Parameters:
        to - File to sa
      • getFeatures

        INDArray getFeatures()
        Returns the features array for the DataSet
        Returns:
        features array
      • setFeatures

        void setFeatures​(INDArray features)
        Set the features array for the DataSet
        Parameters:
        features - Features to set
      • labelCounts

        Map<Integer,​Double> labelCounts()
        Calculate and return a count of each label, by index. Assumes labels are a one-hot INDArray, for classification
        Returns:
        Map of countsn
      • copy

        DataSet copy()
        Create a copy of the DataSet
        Returns:
        Copy of the DataSet
      • reshape

        DataSet reshape​(int rows,
                        int cols)
      • multiplyBy

        void multiplyBy​(double num)
        Multiply the features by a scalar
      • divideBy

        void divideBy​(int num)
        Divide the features by a scalar
      • shuffle

        void shuffle()
        Shuffle the order of the rows in the DataSet. Note that this generally won't make any difference in practice unless the DataSet is later split.
      • squishToRange

        void squishToRange​(double min,
                           double max)
      • scaleMinAndMax

        void scaleMinAndMax​(double min,
                            double max)
      • scale

        void scale()
      • addFeatureVector

        void addFeatureVector​(INDArray toAdd)
      • addFeatureVector

        void addFeatureVector​(INDArray feature,
                              int example)
      • normalize

        void normalize()
        Normalize this DataSet to mean 0, stdev 1 per input. This calculates statistics based on the values in a single DataSet only. For normalization over multiple DataSet objects, use NormalizerStandardize
      • binarize

        void binarize()
      • binarize

        void binarize​(double cutoff)
      • numInputs

        int numInputs()
        Number of input values - i.e., size of the features INDArray per example
      • validate

        void validate()
      • outcome

        int outcome()
      • setNewNumberOfLabels

        void setNewNumberOfLabels​(int labels)
      • setOutcome

        void setOutcome​(int example,
                        int label)
      • filterBy

        DataSet filterBy​(int[] labels)
      • filterAndStrip

        void filterAndStrip​(int[] labels)
      • sortAndBatchByNumLabels

        List<DataSet> sortAndBatchByNumLabels()
      • asList

        List<DataSet> asList()
        Extract each example in the DataSet into its own DataSet object, and return all of them as a list
        Returns:
        List of DataSet objects, each with 1 example only
      • setLabels

        void setLabels​(INDArray labels)
      • sortByLabel

        void sortByLabel()
      • addRow

        void addRow​(DataSet d,
                    int i)
      • sample

        DataSet sample​(int numSamples)
      • sample

        DataSet sample​(int numSamples,
                       boolean withReplacement)
      • sample

        DataSet sample​(int numSamples,
                       Random rng,
                       boolean withReplacement)
      • roundToTheNearest

        void roundToTheNearest​(int roundTo)
      • numOutcomes

        int numOutcomes()
        Returns the number of outcomes (size of the labels array for each example)
      • numExamples

        int numExamples()
        Number of examples in the DataSet
      • getLabelNamesList

        List<String> getLabelNamesList()
      • getLabelName

        String getLabelName​(int idx)
      • setLabelNames

        void setLabelNames​(List<String> labelNames)
      • setColumnNames

        void setColumnNames​(List<String> columnNames)
      • splitTestAndTrain

        SplitTestAndTrain splitTestAndTrain​(double fractionTrain)
        SplitV the DataSet into two DataSets randomly
        Parameters:
        fractionTrain - Fraction (in range 0 to 1) of examples to be returned in the training DataSet object
      • getFeaturesMaskArray

        INDArray getFeaturesMaskArray()
        Input mask array: a mask array for input, where each value is in {0,1} in order to specify whether an input is actually present or not. Typically used for situations such as RNNs with variable length inputs
        Returns:
        Input mask array
      • setFeaturesMaskArray

        void setFeaturesMaskArray​(INDArray inputMask)
        Set the features mask array in this DataSet
      • getLabelsMaskArray

        INDArray getLabelsMaskArray()
        Labels (output) mask array: a mask array for input, where each value is in {0,1} in order to specify whether an output is actually present or not. Typically used for situations such as RNNs with variable length inputs or many- to-one situations.
        Returns:
        Labels (output) mask array
      • setLabelsMaskArray

        void setLabelsMaskArray​(INDArray labelsMask)
        Set the labels mask array in this data set
      • hasMaskArrays

        boolean hasMaskArrays()
        Whether the labels or input (features) mask arrays are present for this DataSet
      • setExampleMetaData

        void setExampleMetaData​(List<? extends Serializable> exampleMetaData)
        Set the metadata for this DataSet
        By convention: the metadata can be any serializable object, one per example in the DataSet
        Parameters:
        exampleMetaData - Example metadata to set
      • getExampleMetaData

        <T extends SerializableList<T> getExampleMetaData​(Class<T> metaDataType)
        Get the example metadata, or null if no metadata has been set
        Note: this method results in an unchecked cast - care should be taken when using this!
        Type Parameters:
        T - Type of metadata
        Parameters:
        metaDataType - Class of the metadata (used for opType information)
        Returns:
        List of metadata objects
      • getExampleMetaData

        List<Serializable> getExampleMetaData()
        Get the example metadata, or null if no metadata has been set
        Returns:
        List of metadata instances
      • getMemoryFootprint

        long getMemoryFootprint()
        This method returns memory used by this DataSet
        Returns:
      • migrate

        void migrate()
        This method migrates this DataSet into current Workspace (if any)
      • detach

        void detach()
        This method detaches this DataSet from current Workspace (if any)
      • isEmpty

        boolean isEmpty()
        Returns:
        true if the DataSet object is empty (no features, labels, or masks)