Package org.nd4j.linalg.dataset.api
Interface DataSet
-
- All Superinterfaces:
Iterable<DataSet>
,Serializable
- All Known Implementing Classes:
DataSet
public interface DataSet extends Iterable<DataSet>, Serializable
-
-
Method Summary
All Methods Instance Methods Abstract Methods Deprecated Methods Modifier and Type Method Description void
addFeatureVector(INDArray toAdd)
void
addFeatureVector(INDArray feature, int example)
void
addRow(DataSet d, int i)
List<DataSet>
asList()
Extract each example in the DataSet into its own DataSet object, and return all of them as a listList<DataSet>
batchBy(int num)
List<DataSet>
batchByNumLabels()
void
binarize()
void
binarize(double cutoff)
DataSet
copy()
Create a copy of the DataSetList<DataSet>
dataSetBatches(int num)
Deprecated.preferbatchBy(int)
void
detach()
This method detaches this DataSet from current Workspace (if any)void
divideBy(int num)
Divide the features by a scalarINDArray
exampleMaxs()
INDArray
exampleMeans()
INDArray
exampleSums()
void
filterAndStrip(int[] labels)
DataSet
filterBy(int[] labels)
DataSet
get(int i)
DataSet
get(int[] i)
List<String>
getColumnNames()
List<Serializable>
getExampleMetaData()
Get the example metadata, or null if no metadata has been set<T extends Serializable>
List<T>getExampleMetaData(Class<T> metaDataType)
Get the example metadata, or null if no metadata has been set
Note: this method results in an unchecked cast - care should be taken when using this!INDArray
getFeatures()
Returns the features array for the DataSetINDArray
getFeaturesMaskArray()
Input mask array: a mask array for input, where each value is in {0,1} in order to specify whether an input is actually present or not.String
getLabelName(int idx)
List<String>
getLabelNames()
Deprecated.List<String>
getLabelNames(INDArray idxs)
List<String>
getLabelNamesList()
INDArray
getLabels()
INDArray
getLabelsMaskArray()
Labels (output) mask array: a mask array for input, where each value is in {0,1} in order to specify whether an output is actually present or not.long
getMemoryFootprint()
This method returns memory used by this DataSetDataSet
getRange(int from, int to)
boolean
hasMaskArrays()
Whether the labels or input (features) mask arrays are present for this DataSetString
id()
boolean
isEmpty()
DataSetIterator
iterateWithMiniBatches()
Deprecated.Iterator<DataSet>
iterator()
Map<Integer,Double>
labelCounts()
Calculate and return a count of each label, by index.void
load(File from)
Load the contents of the DataSet from the specified File.void
load(InputStream from)
Load the contents of the DataSet from the specified InputStream.void
migrate()
This method migrates this DataSet into current Workspace (if any)void
multiplyBy(double num)
Multiply the features by a scalarvoid
normalize()
Normalize this DataSet to mean 0, stdev 1 per input.void
normalizeZeroMeanZeroUnitVariance()
Deprecated.int
numExamples()
Number of examples in the DataSetint
numInputs()
Number of input values - i.e., size of the features INDArray per exampleint
numOutcomes()
Returns the number of outcomes (size of the labels array for each example)int
outcome()
DataSet
reshape(int rows, int cols)
void
roundToTheNearest(int roundTo)
DataSet
sample(int numSamples)
DataSet
sample(int numSamples, boolean withReplacement)
DataSet
sample(int numSamples, Random rng)
DataSet
sample(int numSamples, Random rng, boolean withReplacement)
void
save(File to)
Save this DataSet to a file.void
save(OutputStream to)
Write the contents of this DataSet to the specified OutputStreamvoid
scale()
void
scaleMinAndMax(double min, double max)
void
setColumnNames(List<String> columnNames)
void
setExampleMetaData(List<? extends Serializable> exampleMetaData)
Set the metadata for this DataSet
By convention: the metadata can be any serializable object, one per example in the DataSetvoid
setFeatures(INDArray features)
Set the features array for the DataSetvoid
setFeaturesMaskArray(INDArray inputMask)
Set the features mask array in this DataSetvoid
setLabelNames(List<String> labelNames)
void
setLabels(INDArray labels)
void
setLabelsMaskArray(INDArray labelsMask)
Set the labels mask array in this data setvoid
setNewNumberOfLabels(int labels)
void
setOutcome(int example, int label)
void
shuffle()
Shuffle the order of the rows in the DataSet.List<DataSet>
sortAndBatchByNumLabels()
void
sortByLabel()
SplitTestAndTrain
splitTestAndTrain(double fractionTrain)
SplitV the DataSet into two DataSets randomlySplitTestAndTrain
splitTestAndTrain(int numHoldout)
SplitTestAndTrain
splitTestAndTrain(int numHoldout, Random rnd)
void
squishToRange(double min, double max)
MultiDataSet
toMultiDataSet()
void
validate()
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Method Detail
-
getRange
DataSet getRange(int from, int to)
-
load
void load(InputStream from)
Load the contents of the DataSet from the specified InputStream. The current contents of the DataSet (if any) will be replaced.
The InputStream should contain a DataSet that has been serialized withsave(OutputStream)
- Parameters:
from
- InputStream to load the DataSet from
-
load
void load(File from)
Load the contents of the DataSet from the specified File. The current contents of the DataSet (if any) will be replaced.
The InputStream should contain a DataSet that has been serialized withsave(File)
- Parameters:
from
- File to load the DataSet from
-
save
void save(OutputStream to)
Write the contents of this DataSet to the specified OutputStream- Parameters:
to
- OutputStream to save the DataSet to
-
save
void save(File to)
Save this DataSet to a file. Can be loaded again using- Parameters:
to
- File to sa
-
iterateWithMiniBatches
@Deprecated DataSetIterator iterateWithMiniBatches()
Deprecated.
-
id
String id()
-
getFeatures
INDArray getFeatures()
Returns the features array for the DataSet- Returns:
- features array
-
setFeatures
void setFeatures(INDArray features)
Set the features array for the DataSet- Parameters:
features
- Features to set
-
labelCounts
Map<Integer,Double> labelCounts()
Calculate and return a count of each label, by index. Assumes labels are a one-hot INDArray, for classification- Returns:
- Map of countsn
-
copy
DataSet copy()
Create a copy of the DataSet- Returns:
- Copy of the DataSet
-
reshape
DataSet reshape(int rows, int cols)
-
multiplyBy
void multiplyBy(double num)
Multiply the features by a scalar
-
divideBy
void divideBy(int num)
Divide the features by a scalar
-
shuffle
void shuffle()
Shuffle the order of the rows in the DataSet. Note that this generally won't make any difference in practice unless the DataSet is later split.
-
squishToRange
void squishToRange(double min, double max)
-
scaleMinAndMax
void scaleMinAndMax(double min, double max)
-
scale
void scale()
-
addFeatureVector
void addFeatureVector(INDArray toAdd)
-
addFeatureVector
void addFeatureVector(INDArray feature, int example)
-
normalize
void normalize()
Normalize this DataSet to mean 0, stdev 1 per input. This calculates statistics based on the values in a single DataSet only. For normalization over multiple DataSet objects, useNormalizerStandardize
-
binarize
void binarize()
-
binarize
void binarize(double cutoff)
-
normalizeZeroMeanZeroUnitVariance
@Deprecated void normalizeZeroMeanZeroUnitVariance()
Deprecated.
-
numInputs
int numInputs()
Number of input values - i.e., size of the features INDArray per example
-
validate
void validate()
-
outcome
int outcome()
-
setNewNumberOfLabels
void setNewNumberOfLabels(int labels)
-
setOutcome
void setOutcome(int example, int label)
-
get
DataSet get(int i)
-
get
DataSet get(int[] i)
-
filterBy
DataSet filterBy(int[] labels)
-
filterAndStrip
void filterAndStrip(int[] labels)
-
dataSetBatches
@Deprecated List<DataSet> dataSetBatches(int num)
Deprecated.preferbatchBy(int)
-
asList
List<DataSet> asList()
Extract each example in the DataSet into its own DataSet object, and return all of them as a list- Returns:
- List of DataSet objects, each with 1 example only
-
splitTestAndTrain
SplitTestAndTrain splitTestAndTrain(int numHoldout, Random rnd)
-
splitTestAndTrain
SplitTestAndTrain splitTestAndTrain(int numHoldout)
-
getLabels
INDArray getLabels()
-
setLabels
void setLabels(INDArray labels)
-
sortByLabel
void sortByLabel()
-
addRow
void addRow(DataSet d, int i)
-
exampleSums
INDArray exampleSums()
-
exampleMaxs
INDArray exampleMaxs()
-
exampleMeans
INDArray exampleMeans()
-
sample
DataSet sample(int numSamples)
-
sample
DataSet sample(int numSamples, boolean withReplacement)
-
roundToTheNearest
void roundToTheNearest(int roundTo)
-
numOutcomes
int numOutcomes()
Returns the number of outcomes (size of the labels array for each example)
-
numExamples
int numExamples()
Number of examples in the DataSet
-
getLabelNames
@Deprecated List<String> getLabelNames()
Deprecated.
-
getLabelName
String getLabelName(int idx)
-
splitTestAndTrain
SplitTestAndTrain splitTestAndTrain(double fractionTrain)
SplitV the DataSet into two DataSets randomly- Parameters:
fractionTrain
- Fraction (in range 0 to 1) of examples to be returned in the training DataSet object
-
getFeaturesMaskArray
INDArray getFeaturesMaskArray()
Input mask array: a mask array for input, where each value is in {0,1} in order to specify whether an input is actually present or not. Typically used for situations such as RNNs with variable length inputs- Returns:
- Input mask array
-
setFeaturesMaskArray
void setFeaturesMaskArray(INDArray inputMask)
Set the features mask array in this DataSet
-
getLabelsMaskArray
INDArray getLabelsMaskArray()
Labels (output) mask array: a mask array for input, where each value is in {0,1} in order to specify whether an output is actually present or not. Typically used for situations such as RNNs with variable length inputs or many- to-one situations.- Returns:
- Labels (output) mask array
-
setLabelsMaskArray
void setLabelsMaskArray(INDArray labelsMask)
Set the labels mask array in this data set
-
hasMaskArrays
boolean hasMaskArrays()
Whether the labels or input (features) mask arrays are present for this DataSet
-
setExampleMetaData
void setExampleMetaData(List<? extends Serializable> exampleMetaData)
Set the metadata for this DataSet
By convention: the metadata can be any serializable object, one per example in the DataSet- Parameters:
exampleMetaData
- Example metadata to set
-
getExampleMetaData
<T extends Serializable> List<T> getExampleMetaData(Class<T> metaDataType)
Get the example metadata, or null if no metadata has been set
Note: this method results in an unchecked cast - care should be taken when using this!- Type Parameters:
T
- Type of metadata- Parameters:
metaDataType
- Class of the metadata (used for opType information)- Returns:
- List of metadata objects
-
getExampleMetaData
List<Serializable> getExampleMetaData()
Get the example metadata, or null if no metadata has been set- Returns:
- List of metadata instances
-
getMemoryFootprint
long getMemoryFootprint()
This method returns memory used by this DataSet- Returns:
-
migrate
void migrate()
This method migrates this DataSet into current Workspace (if any)
-
detach
void detach()
This method detaches this DataSet from current Workspace (if any)
-
isEmpty
boolean isEmpty()
- Returns:
- true if the DataSet object is empty (no features, labels, or masks)
-
toMultiDataSet
MultiDataSet toMultiDataSet()
-
-