DataSet (deeplearning4j-core 0.0.3.1 API)

java.lang.Object
- org.deeplearning4j.berkeley.Pair<org.jblas.DoubleMatrix,org.jblas.DoubleMatrix>
- - org.deeplearning4j.datasets.DataSet

All Implemented Interfaces:

Serializable, Iterable<DataSet>, Persistable

Direct Known Subclasses:

Example
```
public class DataSet
extends Pair<org.jblas.DoubleMatrix,org.jblas.DoubleMatrix>
implements Persistable, Iterable<DataSet>
```
A data set (example/outcome pairs) The outcomes are specifically for neural network encoding such that any labels that are considered true are 1s. The rest are zeros.

Author:

Adam Gibson

See Also:
Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class org.deeplearning4j.berkeley.Pair
  Pair.DefaultLexicographicPairComparator<F extends Comparable<F>,S extends Comparable<S>>, Pair.FirstComparator<S extends Comparable<? super S>,T>, Pair.LexicographicPairComparator<F,S>, Pair.ReverseFirstComparator<S extends Comparable<? super S>,T>, Pair.ReverseSecondComparator<S,T extends Comparable<? super T>>, Pair.SecondComparator<S,T extends Comparable<? super T>>

Constructor Summary

Constructors
Constructor and Description
`DataSet()`
`DataSet(org.jblas.DoubleMatrix first, org.jblas.DoubleMatrix second)`
`DataSet(Pair<org.jblas.DoubleMatrix,org.jblas.DoubleMatrix> pair)`

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addFeatureVector(org.jblas.DoubleMatrix toAdd)` Adds a feature for each example on to the current feature vector
`void`	`addFeatureVector(org.jblas.DoubleMatrix feature, int example)` The feature to add, and the example/row number
`void`	`addRow(DataSet d, int i)`
`List<DataSet>`	`asList()`
`List<List<DataSet>>`	`batchBy(int num)`
`List<List<DataSet>>`	`batchByNumLabels()`
`DataSet`	`copy()`
`List<DataSet>`	`dataSetBatches(int num)` Partitions the data set by the specified number.
`void`	`divideBy(int num)`
`static DataSet`	`empty()`
`org.jblas.DoubleMatrix`	`exampleMaxs()`
`org.jblas.DoubleMatrix`	`exampleMeans()`
`org.jblas.DoubleMatrix`	`exampleSums()`
`void`	`filterAndStrip(int[] labels)` Strips the dataset down to the specified labels and remaps them
`DataSet`	`filterBy(int[] labels)` Strips the data set of all but the passed in labels
`DataSet`	`get(int i)` Gets a copy of example i
`Iterator<DataSet>`	`iterator()`
`DataSetIterator`	`iterator(int batches)`
`static DataSet`	`load(File path)`
`void`	`load(InputStream is)`
`static void`	`main(String[] args)`
`static DataSet`	`merge(List<DataSet> data)`
`void`	`multiplyBy(int num)`
`void`	`normalize()`
`void`	`normalizeZeroMeanZeroUnitVariance()`
`int`	`numExamples()`
`int`	`numInputs()`
`int`	`numOutcomes()`
`int`	`outcome()`
`Counter<Integer>`	`outcomeCounts()` Gets the label distribution (counts of each possible outcome)
`void`	`roundInputToTheNearest(int numDecimalPlaces)`
`void`	`roundToTheNearest(int roundTo)`
`DataSet`	`sample(int numSamples)` Sample without replacement and a random rng
`DataSet`	`sample(int numSamples, boolean withReplacement)` Sample a dataset numSamples times
`DataSet`	`sample(int numSamples, org.apache.commons.math3.random.RandomGenerator rng)` Sample without replacement
`DataSet`	`sample(int numSamples, org.apache.commons.math3.random.RandomGenerator rng, boolean withReplacement)` Sample a dataset
`void`	`saveTo(File file, boolean binary)`
`void`	`scale()`
`void`	`setNewNumberOfLabels(int labels)` Clears the outcome matrix setting a new number of labels
`void`	`setOutcome(int example, int label)` Sets the outcome of a particular example
`void`	`shuffle()`
`List<List<DataSet>>`	`sortAndBatchByNumLabels()` Sorts the dataset by label: Splits the data set such that examples are sorted by their labels.
`void`	`sortByLabel()` Organizes the dataset to minimize sampling error while still allowing efficient batching.
`Pair<DataSet,DataSet>`	`splitTestAndTrain(int numHoldout)`
`String`	`toString()`
`void`	`validate()`
`void`	`write(OutputStream os)`

Methods inherited from class org.deeplearning4j.berkeley.Pair
equals, getFirst, getSecond, hashCode, makePair, newPair, reverse, setFirst, setSecond

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - DataSet
```
public DataSet()
```
  - DataSet
```
public DataSet(Pair<org.jblas.DoubleMatrix,org.jblas.DoubleMatrix> pair)
```
  - DataSet
```
public DataSet(org.jblas.DoubleMatrix first,
       org.jblas.DoubleMatrix second)
```
- Method Detail
  - iterator
```
public DataSetIterator iterator(int batches)
```
  - copy
```
public DataSet copy()
```
  - empty
```
public static DataSet empty()
```
  - merge
```
public static DataSet merge(List<DataSet> data)
```
  - multiplyBy
```
public void multiplyBy(int num)
```
  - divideBy
```
public void divideBy(int num)
```
  - shuffle
```
public void shuffle()
```
  - roundInputToTheNearest
```
public void roundInputToTheNearest(int numDecimalPlaces)
```
  - scale
```
public void scale()
```
  - addFeatureVector
```
public void addFeatureVector(org.jblas.DoubleMatrix toAdd)
```
    Adds a feature for each example on to the current feature vector
    
    Parameters:
    toAdd - the feature vector to add
  - addFeatureVector
```
public void addFeatureVector(org.jblas.DoubleMatrix feature,
                    int example)
```
    The feature to add, and the example/row number
    
    Parameters:
    feature - the feature vector to add
    example - the number of the example to append to
  - normalize
```
public void normalize()
```
  - normalizeZeroMeanZeroUnitVariance
```
public void normalizeZeroMeanZeroUnitVariance()
```
  - numInputs
```
public int numInputs()
```
  - validate
```
public void validate()
```
  - outcome
```
public int outcome()
```
  - setNewNumberOfLabels
```
public void setNewNumberOfLabels(int labels)
```
    Clears the outcome matrix setting a new number of labels
    
    Parameters:
    labels - the number of labels/columns in the outcome matrix Note that this clears the labels for each example
  - setOutcome
```
public void setOutcome(int example,
              int label)
```
    Sets the outcome of a particular example
    
    Parameters:
    example - the example to set
    label - the label of the outcome
  - get
```
public DataSet get(int i)
```
    Gets a copy of example i
    
    Parameters:
    i - the example to get
    
    Returns:
    the example at i (one example)
  - batchBy
```
public List<List<DataSet>> batchBy(int num)
```
  - outcomeCounts
```
public Counter<Integer> outcomeCounts()
```
    Gets the label distribution (counts of each possible outcome)
    
    Returns:
    the counts of each possible outcome
  - filterBy
```
public DataSet filterBy(int[] labels)
```
    Strips the data set of all but the passed in labels
    
    Parameters:
    labels - strips the data set of all but the passed in labels
    
    Returns:
    the dataset with only the specified labels
  - filterAndStrip
```
public void filterAndStrip(int[] labels)
```
    Strips the dataset down to the specified labels and remaps them
    
    Parameters:
    labels - the labels to strip down to
  - dataSetBatches
```
public List<DataSet> dataSetBatches(int num)
```
    Partitions the data set by the specified number.
    
    Parameters:
    num - the number to split by
    
    Returns:
    the paritioned data set
  - sortAndBatchByNumLabels
```
public List<List<DataSet>> sortAndBatchByNumLabels()
```
    Sorts the dataset by label: Splits the data set such that examples are sorted by their labels. A ten label dataset would produce lists with batches like the following: x1 y = 1 x2 y = 2 ... x10 y = 10
    
    Returns:
    a list of data sets partitioned by outcomes
  - batchByNumLabels
```
public List<List<DataSet>> batchByNumLabels()
```
  - asList
```
public List<DataSet> asList()
```
  - splitTestAndTrain
```
public Pair<DataSet,DataSet> splitTestAndTrain(int numHoldout)
```
  - sortByLabel
```
public void sortByLabel()
```
    Organizes the dataset to minimize sampling error while still allowing efficient batching.
  - addRow
```
public void addRow(DataSet d,
          int i)
```
  - exampleSums
```
public org.jblas.DoubleMatrix exampleSums()
```
  - exampleMaxs
```
public org.jblas.DoubleMatrix exampleMaxs()
```
  - exampleMeans
```
public org.jblas.DoubleMatrix exampleMeans()
```
  - saveTo
```
public void saveTo(File file,
          boolean binary)
            throws IOException
```
    Throws:
    
    IOException
  - load
```
public static DataSet load(File path)
                    throws IOException
```
    Throws:
    
    IOException
  - sample
```
public DataSet sample(int numSamples)
```
    Sample without replacement and a random rng
    
    Parameters:
    numSamples - the number of samples to get
    
    Returns:
    a sample data set without replacement
  - sample
```
public DataSet sample(int numSamples,
             org.apache.commons.math3.random.RandomGenerator rng)
```
    Sample without replacement
    
    Parameters:
    numSamples - the number of samples to get
    rng - the rng to use
    
    Returns:
    the sampled dataset without replacement
  - sample
```
public DataSet sample(int numSamples,
             boolean withReplacement)
```
    Sample a dataset numSamples times
    
    Parameters:
    numSamples - the number of samples to get
    withReplacement - the rng to use
    
    Returns:
    the sampled dataset without replacement
  - sample
```
public DataSet sample(int numSamples,
             org.apache.commons.math3.random.RandomGenerator rng,
             boolean withReplacement)
```
    Sample a dataset
    
    Parameters:
    numSamples - the number of samples to get
    rng - the rng to use
    withReplacement - whether to allow duplicates (only tracked by example row number)
    
    Returns:
    the sample dataset
  - roundToTheNearest
```
public void roundToTheNearest(int roundTo)
```
  - numOutcomes
```
public int numOutcomes()
```
  - numExamples
```
public int numExamples()
```
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Pair<org.jblas.DoubleMatrix,org.jblas.DoubleMatrix>
  - main
```
public static void main(String[] args)
                 throws IOException
```
    Throws:
    
    IOException
  - write
```
public void write(OutputStream os)
```
    Specified by:
    
    write in interface Persistable
  - load
```
public void load(InputStream is)
```
    Specified by:
    
    load in interface Persistable
  - iterator
```
public Iterator<DataSet> iterator()
```
    Specified by:
    
    iterator in interface Iterable<DataSet>

Class DataSet

Nested Class Summary

Nested classes/interfaces inherited from class org.deeplearning4j.berkeley.Pair

Constructor Summary

Method Summary

Methods inherited from class org.deeplearning4j.berkeley.Pair

Methods inherited from class java.lang.Object

Constructor Detail

DataSet

DataSet

DataSet

Method Detail

iterator

copy

empty

merge

multiplyBy

divideBy

shuffle

roundInputToTheNearest

scale

addFeatureVector

addFeatureVector

normalize

normalizeZeroMeanZeroUnitVariance

numInputs

validate

outcome

setNewNumberOfLabels

setOutcome

get

batchBy

outcomeCounts

filterBy

filterAndStrip

dataSetBatches

sortAndBatchByNumLabels

batchByNumLabels

asList

splitTestAndTrain

sortByLabel

addRow

exampleSums

exampleMaxs

exampleMeans

saveTo

load

sample

sample

sample

sample

roundToTheNearest

numOutcomes

numExamples

toString

main

write

load

iterator