Class TransformProcess.Builder
- java.lang.Object
-
- org.datavec.api.transform.TransformProcess.Builder
-
- Enclosing class:
- TransformProcess
public static class TransformProcess.Builder extends Object
Builder class for constructing a TransformProcess
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description TransformProcess.Builder
addConstantColumn(String newColumnName, ColumnType newColumnType, Writable fixedValue)
Add a new column, where all values in the column are identical and as specified.TransformProcess.Builder
addConstantDoubleColumn(String newColumnName, double value)
Add a new double column, where the value for that column (for all records) are identicalTransformProcess.Builder
addConstantIntegerColumn(String newColumnName, int value)
Add a new integer column, where th e value for that column (for all records) are identicalTransformProcess.Builder
addConstantLongColumn(String newColumnName, long value)
Add a new integer column, where the value for that column (for all records) are identicalTransformProcess.Builder
appendStringColumnTransform(String column, String toAppend)
Append a String to a specified columnTransformProcess
build()
Create the TransformProcess objectTransformProcess.Builder
calculateSortedRank(String newColumnName, String sortOnColumn, WritableComparator comparator)
CalculateSortedRank: calculate the rank of each example, after sorting example.TransformProcess.Builder
calculateSortedRank(String newColumnName, String sortOnColumn, WritableComparator comparator, boolean ascending)
CalculateSortedRank: calculate the rank of each example, after sorting example.TransformProcess.Builder
categoricalToInteger(String... columnNames)
Convert the specified column(s) from a categorical representation to an integer representation.TransformProcess.Builder
categoricalToOneHot(String... columnNames)
Convert the specified column(s) from a categorical representation to a one-hot representation.TransformProcess.Builder
conditionalCopyValueTransform(String columnToReplace, String sourceColumn, Condition condition)
Replace the value in a specified column with a new value taken from another column, if a condition is satisfied/true.
Note that the condition can be any generic condition, including on other column(s), different to the column that will be modified if the condition is satisfied/true.TransformProcess.Builder
conditionalReplaceValueTransform(String column, Writable newValue, Condition condition)
Replace the values in a specified column with a specified new value, if some condition holds.TransformProcess.Builder
conditionalReplaceValueTransformWithDefault(String column, Writable yesVal, Writable noVal, Condition condition)
Replace the values in a specified column with a specified "yes" value, if some condition holds.TransformProcess.Builder
convertFromSequence()
Convert a sequence to a set of individual values (by treating each value in each sequence as a separate example)TransformProcess.Builder
convertToDouble(String inputColumn)
Convert the specified column to a double.TransformProcess.Builder
convertToInteger(String inputColumn)
Convert the specified column to an integer.TransformProcess.Builder
convertToSequence()
Convert a set of independent records/examples into a sequence; each example is simply treated as a sequence of length 1, without any join/group operations.TransformProcess.Builder
convertToSequence(String keyColumn, SequenceComparator comparator)
Convert a set of independent records/examples into a sequence, according to some key.TransformProcess.Builder
convertToSequence(List<String> keyColumns, SequenceComparator comparator)
Convert a set of independent records/examples into a sequence, where each sequence is grouped according to one or more key values (i.e., the values in one or more columns) Within each sequence, values are ordered using the providedSequenceComparator
TransformProcess.Builder
convertToString(String inputColumn)
Convert the specified column to a string.TransformProcess.Builder
doubleColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
Calculate and add a new double column by performing a mathematical operation on a number of existing columns.TransformProcess.Builder
doubleMathFunction(String columnName, MathFunction mathFunction)
Perform a mathematical operation (such as sin(x), ceil(x), exp(x) etc) on a columnTransformProcess.Builder
doubleMathOp(String columnName, MathOp mathOp, double scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified double column, with a scalarTransformProcess.Builder
duplicateColumn(String column, String newName)
Duplicate a single columnTransformProcess.Builder
duplicateColumns(List<String> columnNames, List<String> newNames)
Duplicate a set of columnsTransformProcess.Builder
filter(Condition condition)
Add a filter operation, based on the specified condition.TransformProcess.Builder
filter(Filter filter)
Add a filter operation to be executed after the previously-added operations have been executedTransformProcess.Builder
firstDigitTransform(String inputColumn, String outputColumn)
FirstDigitTransform converts a column to a categorical column, with values being the first digit of the number.
For example, "3.1415" becomes "3" and "2.0" becomes "2".
Negative numbers ignore the sign: "-7.123" becomes "7".
Note that twoFirstDigitTransform.Mode
s are supported, which determines how non-numerical entries should be handled:
EXCEPTION_ON_INVALID: output has 10 category values ("0", ..., "9"), and any non-numerical values result in an exception
INCLUDE_OTHER_CATEGORY: output has 11 category values ("0", ..., "9", "Other"), all non-numerical values are mapped to "Other"
FirstDigitTransform is useful (combined withCategoricalToOneHotTransform
and Reductions) to implement Benford's law.TransformProcess.Builder
firstDigitTransform(String inputColumn, String outputColumn, FirstDigitTransform.Mode mode)
FirstDigitTransform converts a column to a categorical column, with values being the first digit of the number.
For example, "3.1415" becomes "3" and "2.0" becomes "2".
Negative numbers ignore the sign: "-7.123" becomes "7".
Note that twoFirstDigitTransform.Mode
s are supported, which determines how non-numerical entries should be handled:
EXCEPTION_ON_INVALID: output has 10 category values ("0", ..., "9"), and any non-numerical values result in an exception
INCLUDE_OTHER_CATEGORY: output has 11 category values ("0", ..., "9", "Other"), all non-numerical values are mapped to "Other"
FirstDigitTransform is useful (combined withCategoricalToOneHotTransform
and Reductions) to implement Benford's law.TransformProcess.Builder
floatColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
Calculate and add a new float column by performing a mathematical operation on a number of existing columns.TransformProcess.Builder
floatMathFunction(String columnName, MathFunction mathFunction)
Perform a mathematical operation (such as sin(x), ceil(x), exp(x) etc) on a columnTransformProcess.Builder
floatMathOp(String columnName, MathOp mathOp, float scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified double column, with a scalarTransformProcess.Builder
integerColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
Calculate and add a new integer column by performing a mathematical operation on a number of existing columns.TransformProcess.Builder
integerMathOp(String column, MathOp mathOp, int scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified integer column, with a scalarTransformProcess.Builder
integerToCategorical(String columnName, List<String> categoryStateNames)
Convert the specified column from an integer representation (assume values 0 to numCategories-1) to a categorical representation, given the specified state namesTransformProcess.Builder
integerToCategorical(String columnName, Map<Integer,String> categoryIndexNameMap)
Convert the specified column from an integer representation to a categorical representation, given the specified mapping between integer indexes and state namesTransformProcess.Builder
integerToOneHot(String columnName, int minValue, int maxValue)
Convert an integer column to a set of 1 hot columns, based on the value in integer columnTransformProcess.Builder
longColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
Calculate and add a new long column by performing a mathematical operation on a number of existing columns.TransformProcess.Builder
longMathOp(String columnName, MathOp mathOp, long scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified long column, with a scalarTransformProcess.Builder
ndArrayColumnsMathOpTransform(String newColumnName, MathOp mathOp, String... columnNames)
Perform an element wise mathematical operation (such as add, subtract, multiply) on NDArray columns.TransformProcess.Builder
ndArrayDistanceTransform(String newColumnName, Distance distance, String firstCol, String secondCol)
Calculate a distance (cosine similarity, Euclidean, Manhattan) on two equal-sized NDArray columns.TransformProcess.Builder
ndArrayMathFunctionTransform(String columnName, MathFunction mathFunction)
Apply an element wise mathematical function (sin, tanh, abs etc) to an NDArray column.TransformProcess.Builder
ndArrayScalarOpTransform(String columnName, MathOp op, double value)
Element-wise NDArray math operation (add, subtract, etc) on an NDArray columnTransformProcess.Builder
normalize(String column, Normalize type, DataAnalysis da)
Normalize the specified column with a given type of normalizationTransformProcess.Builder
offsetSequence(List<String> columnsToOffset, int offsetAmount, SequenceOffsetTransform.OperationType operationType)
Perform a sequence of operation on the specified columns.TransformProcess.Builder
reduce(IAssociativeReducer reducer)
Reduce (i.e., aggregate/combine) a set of examples (typically by key).TransformProcess.Builder
reduceSequence(IAssociativeReducer reducer)
Reduce (i.e., aggregate/combine) a set of sequence examples - for each sequence individually.TransformProcess.Builder
reduceSequenceByWindow(IAssociativeReducer reducer, WindowFunction windowFunction)
Reduce (i.e., aggregate/combine) a set of sequence examples - for each sequence individually - using a window function.TransformProcess.Builder
removeAllColumnsExceptFor(String... columnNames)
Remove all columns, except for those that are specified hereTransformProcess.Builder
removeAllColumnsExceptFor(Collection<String> columnNames)
Remove all columns, except for those that are specified hereTransformProcess.Builder
removeColumns(String... columnNames)
Remove all of the specified columns, by nameTransformProcess.Builder
removeColumns(Collection<String> columnNames)
Remove all of the specified columns, by nameTransformProcess.Builder
renameColumn(String oldName, String newName)
Rename a single columnTransformProcess.Builder
renameColumns(List<String> oldNames, List<String> newNames)
Rename multiple columnsTransformProcess.Builder
reorderColumns(String... newOrder)
Reorder the columns using a partial or complete new ordering.TransformProcess.Builder
replaceStringTransform(String columnName, Map<String,String> mapping)
Replace one or more String values in the specified column that match regular expressions.TransformProcess.Builder
sequenceMovingWindowReduce(String columnName, int lookback, ReduceOp op)
SequenceMovingWindowReduceTransform: Adds a new column, where the value is derived by:
(a) using a window of the last N values in a single column,
(b) Apply a reduction op on the window to calculate a new value
for example, this transformer can be used to implement a simple moving average of the last N values, or determine the minimum or maximum values in the last N time steps.TransformProcess.Builder
splitSequence(SequenceSplit split)
Split sequences into 1 or more other sequences.TransformProcess.Builder
stringMapTransform(String columnName, Map<String,String> mapping)
Replace one or more String values in the specified column with new values.TransformProcess.Builder
stringRemoveWhitespaceTransform(String columnName)
Remove all whitespace characters from the values in the specified String columnTransformProcess.Builder
stringToCategorical(String columnName, List<String> stateNames)
Convert the specified String column to a categorical column.TransformProcess.Builder
stringToTimeTransform(String column, String format, org.joda.time.DateTimeZone dateTimeZone)
Convert a String column (containing a date/time String) to a time column (by parsing the date/time String)TransformProcess.Builder
stringToTimeTransform(String column, String format, org.joda.time.DateTimeZone dateTimeZone, Locale locale)
Convert a String column (containing a date/time String) to a time column (by parsing the date/time String)TransformProcess.Builder
timeMathOp(String columnName, MathOp mathOp, long timeQuantity, TimeUnit timeUnit)
Perform a mathematical operation (add, subtract, scalar min/max only) on the specified time columnTransformProcess.Builder
transform(Transform transform)
Add a transformation to be executed after the previously-added operations have been executedTransformProcess.Builder
trimOrPadSequenceToLength(int length, @NonNull List<Writable> pad)
Trim or pad the sequence to the specified length (number of sequence steps).
Sequences longer than the specified maximum will be trimmed to exactly the maximum.TransformProcess.Builder
trimSequence(int numStepsToTrim, boolean trimFromStart)
SequenceTrimTranform removes the first or last N values in a sequence.TransformProcess.Builder
trimSequenceToLength(int maxLength)
Trim the sequence to the specified length (number of sequence steps).
Sequences longer than the specified maximum will be trimmed to exactly the maximum.
-
-
-
Constructor Detail
-
Builder
public Builder(Schema initialSchema)
-
-
Method Detail
-
transform
public TransformProcess.Builder transform(Transform transform)
Add a transformation to be executed after the previously-added operations have been executed- Parameters:
transform
- Transform to execute
-
filter
public TransformProcess.Builder filter(Filter filter)
Add a filter operation to be executed after the previously-added operations have been executed- Parameters:
filter
- Filter operation to execute
-
filter
public TransformProcess.Builder filter(Condition condition)
Add a filter operation, based on the specified condition. If condition is satisfied (returns true): remove the example or sequence
If condition is not satisfied (returns false): keep the example or sequence- Parameters:
condition
- Condition to filter on
-
removeColumns
public TransformProcess.Builder removeColumns(String... columnNames)
Remove all of the specified columns, by name- Parameters:
columnNames
- Names of the columns to remove
-
removeColumns
public TransformProcess.Builder removeColumns(Collection<String> columnNames)
Remove all of the specified columns, by name- Parameters:
columnNames
- Names of the columns to remove
-
removeAllColumnsExceptFor
public TransformProcess.Builder removeAllColumnsExceptFor(String... columnNames)
Remove all columns, except for those that are specified here- Parameters:
columnNames
- Names of the columns to keep
-
removeAllColumnsExceptFor
public TransformProcess.Builder removeAllColumnsExceptFor(Collection<String> columnNames)
Remove all columns, except for those that are specified here- Parameters:
columnNames
- Names of the columns to keep
-
renameColumn
public TransformProcess.Builder renameColumn(String oldName, String newName)
Rename a single column- Parameters:
oldName
- Original column namenewName
- New column name
-
renameColumns
public TransformProcess.Builder renameColumns(List<String> oldNames, List<String> newNames)
Rename multiple columns- Parameters:
oldNames
- List of original column namesnewNames
- List of new column names
-
reorderColumns
public TransformProcess.Builder reorderColumns(String... newOrder)
Reorder the columns using a partial or complete new ordering. If only some of the column names are specified for the new order, the remaining columns will be placed at the end, according to their current relative ordering- Parameters:
newOrder
- Names of the columns, in the order they will appear in the output
-
duplicateColumn
public TransformProcess.Builder duplicateColumn(String column, String newName)
Duplicate a single column- Parameters:
column
- Name of the column to duplicatenewName
- Name of the new (duplicate) column
-
duplicateColumns
public TransformProcess.Builder duplicateColumns(List<String> columnNames, List<String> newNames)
Duplicate a set of columns- Parameters:
columnNames
- Names of the columns to duplicatenewNames
- Names of the new (duplicated) columns
-
integerMathOp
public TransformProcess.Builder integerMathOp(String column, MathOp mathOp, int scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified integer column, with a scalar- Parameters:
column
- The integer column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operation
-
integerColumnsMathOp
public TransformProcess.Builder integerColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
Calculate and add a new integer column by performing a mathematical operation on a number of existing columns. New column is added to the end.- Parameters:
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operation
-
longMathOp
public TransformProcess.Builder longMathOp(String columnName, MathOp mathOp, long scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified long column, with a scalar- Parameters:
columnName
- The long column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operation
-
longColumnsMathOp
public TransformProcess.Builder longColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
Calculate and add a new long column by performing a mathematical operation on a number of existing columns. New column is added to the end.- Parameters:
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operation
-
floatMathOp
public TransformProcess.Builder floatMathOp(String columnName, MathOp mathOp, float scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified double column, with a scalar- Parameters:
columnName
- The float column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operation
-
floatColumnsMathOp
public TransformProcess.Builder floatColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
Calculate and add a new float column by performing a mathematical operation on a number of existing columns. New column is added to the end.- Parameters:
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operation
-
floatMathFunction
public TransformProcess.Builder floatMathFunction(String columnName, MathFunction mathFunction)
Perform a mathematical operation (such as sin(x), ceil(x), exp(x) etc) on a column- Parameters:
columnName
- Column name to operate onmathFunction
- MathFunction to apply to the column
-
doubleMathOp
public TransformProcess.Builder doubleMathOp(String columnName, MathOp mathOp, double scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified double column, with a scalar- Parameters:
columnName
- The double column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operation
-
doubleColumnsMathOp
public TransformProcess.Builder doubleColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
Calculate and add a new double column by performing a mathematical operation on a number of existing columns. New column is added to the end.- Parameters:
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operation
-
doubleMathFunction
public TransformProcess.Builder doubleMathFunction(String columnName, MathFunction mathFunction)
Perform a mathematical operation (such as sin(x), ceil(x), exp(x) etc) on a column- Parameters:
columnName
- Column name to operate onmathFunction
- MathFunction to apply to the column
-
timeMathOp
public TransformProcess.Builder timeMathOp(String columnName, MathOp mathOp, long timeQuantity, TimeUnit timeUnit)
Perform a mathematical operation (add, subtract, scalar min/max only) on the specified time column- Parameters:
columnName
- The integer column to perform the operation onmathOp
- The mathematical operationtimeQuantity
- The quantity used in the mathematical optimeUnit
- The unit that timeQuantity is specified in
-
categoricalToOneHot
public TransformProcess.Builder categoricalToOneHot(String... columnNames)
Convert the specified column(s) from a categorical representation to a one-hot representation. This involves the creation of multiple new columns each.- Parameters:
columnNames
- Names of the categorical column(s) to convert to a one-hot representation
-
categoricalToInteger
public TransformProcess.Builder categoricalToInteger(String... columnNames)
Convert the specified column(s) from a categorical representation to an integer representation. This will replace the specified categorical column(s) with an integer repreesentation, where each integer has the value 0 to numCategories-1.- Parameters:
columnNames
- Name of the categorical column(s) to convert to an integer representation
-
integerToCategorical
public TransformProcess.Builder integerToCategorical(String columnName, List<String> categoryStateNames)
Convert the specified column from an integer representation (assume values 0 to numCategories-1) to a categorical representation, given the specified state names- Parameters:
columnName
- Name of the column to convertcategoryStateNames
- Names of the states for the categorical column
-
integerToCategorical
public TransformProcess.Builder integerToCategorical(String columnName, Map<Integer,String> categoryIndexNameMap)
Convert the specified column from an integer representation to a categorical representation, given the specified mapping between integer indexes and state names- Parameters:
columnName
- Name of the column to convertcategoryIndexNameMap
- Names of the states for the categorical column
-
integerToOneHot
public TransformProcess.Builder integerToOneHot(String columnName, int minValue, int maxValue)
Convert an integer column to a set of 1 hot columns, based on the value in integer column- Parameters:
columnName
- Name of the integer columnminValue
- Minimum value possible for the integer column (inclusive)maxValue
- Maximum value possible for the integer column (inclusive)
-
addConstantColumn
public TransformProcess.Builder addConstantColumn(String newColumnName, ColumnType newColumnType, Writable fixedValue)
Add a new column, where all values in the column are identical and as specified.- Parameters:
newColumnName
- Name of the new columnnewColumnType
- Type of the new columnfixedValue
- Value in the new column for all records
-
addConstantDoubleColumn
public TransformProcess.Builder addConstantDoubleColumn(String newColumnName, double value)
Add a new double column, where the value for that column (for all records) are identical- Parameters:
newColumnName
- Name of the new columnvalue
- Value in the new column for all records
-
addConstantIntegerColumn
public TransformProcess.Builder addConstantIntegerColumn(String newColumnName, int value)
Add a new integer column, where th e value for that column (for all records) are identical- Parameters:
newColumnName
- Name of the new columnvalue
- Value of the new column for all records
-
addConstantLongColumn
public TransformProcess.Builder addConstantLongColumn(String newColumnName, long value)
Add a new integer column, where the value for that column (for all records) are identical- Parameters:
newColumnName
- Name of the new columnvalue
- Value in the new column for all records
-
convertToString
public TransformProcess.Builder convertToString(String inputColumn)
Convert the specified column to a string.- Parameters:
inputColumn
- the input column to convert- Returns:
- builder pattern
-
convertToDouble
public TransformProcess.Builder convertToDouble(String inputColumn)
Convert the specified column to a double.- Parameters:
inputColumn
- the input column to convert- Returns:
- builder pattern
-
convertToInteger
public TransformProcess.Builder convertToInteger(String inputColumn)
Convert the specified column to an integer.- Parameters:
inputColumn
- the input column to convert- Returns:
- builder pattern
-
normalize
public TransformProcess.Builder normalize(String column, Normalize type, DataAnalysis da)
Normalize the specified column with a given type of normalization- Parameters:
column
- Column to normalizetype
- Type of normalization to applyda
- DataAnalysis object
-
convertToSequence
public TransformProcess.Builder convertToSequence(String keyColumn, SequenceComparator comparator)
Convert a set of independent records/examples into a sequence, according to some key. Within each sequence, values are ordered using the providedSequenceComparator
- Parameters:
keyColumn
- Column to use as a key (values with the same key will be combined into sequences)comparator
- A SequenceComparator to order the values within each sequence (for example, by time or String order)
-
convertToSequence
public TransformProcess.Builder convertToSequence()
Convert a set of independent records/examples into a sequence; each example is simply treated as a sequence of length 1, without any join/group operations. Note that more commonly, joining/grouping is required; useconvertToSequence(List, SequenceComparator)
for this functionality
-
convertToSequence
public TransformProcess.Builder convertToSequence(List<String> keyColumns, SequenceComparator comparator)
Convert a set of independent records/examples into a sequence, where each sequence is grouped according to one or more key values (i.e., the values in one or more columns) Within each sequence, values are ordered using the providedSequenceComparator
- Parameters:
keyColumns
- Column to use as a key (values with the same key will be combined into sequences)comparator
- A SequenceComparator to order the values within each sequence (for example, by time or String order)
-
convertFromSequence
public TransformProcess.Builder convertFromSequence()
Convert a sequence to a set of individual values (by treating each value in each sequence as a separate example)
-
splitSequence
public TransformProcess.Builder splitSequence(SequenceSplit split)
Split sequences into 1 or more other sequences. Used for example to split large sequences into a set of smaller sequences- Parameters:
split
- SequenceSplit that defines how splits will occur
-
trimSequence
public TransformProcess.Builder trimSequence(int numStepsToTrim, boolean trimFromStart)
SequenceTrimTranform removes the first or last N values in a sequence. Note that the resulting sequence may be of length 0, if the input sequence is less than or equal to N.- Parameters:
numStepsToTrim
- Number of time steps to trim from the sequencetrimFromStart
- If true: Trim values from the start of the sequence. If false: trim values from the end.
-
trimSequenceToLength
public TransformProcess.Builder trimSequenceToLength(int maxLength)
Trim the sequence to the specified length (number of sequence steps).
Sequences longer than the specified maximum will be trimmed to exactly the maximum. Shorter sequences will not be modified.- Parameters:
maxLength
- Maximum sequence length (number of time steps)
-
trimOrPadSequenceToLength
public TransformProcess.Builder trimOrPadSequenceToLength(int length, @NonNull @NonNull List<Writable> pad)
Trim or pad the sequence to the specified length (number of sequence steps).
Sequences longer than the specified maximum will be trimmed to exactly the maximum. Shorter sequences will be padded with as many copies of the "pad" array to make the sequence length equal the specified maximum.
Note that the 'pad' list (i.e., values to pad with) must be equal in length to the number of columns (values per time step)- Parameters:
length
- Required length - trim sequences longer than this, pad sequences shorter than thispad
- Values to pad at the end of the sequence
-
offsetSequence
public TransformProcess.Builder offsetSequence(List<String> columnsToOffset, int offsetAmount, SequenceOffsetTransform.OperationType operationType)
Perform a sequence of operation on the specified columns. Note that this also truncates sequences by the specified offset amount by default. Usetransform(new SequenceOffsetTransform(...)
to change this. SeeSequenceOffsetTransform
for details on exactly what this operation does and how.- Parameters:
columnsToOffset
- Columns to offsetoffsetAmount
- Amount to offset the specified columns by (positive offset: 'columnsToOffset' are moved to later time steps)operationType
- Whether the offset should be done in-place or by adding a new column
-
reduce
public TransformProcess.Builder reduce(IAssociativeReducer reducer)
Reduce (i.e., aggregate/combine) a set of examples (typically by key). Note: In the current implementation, reduction operations can be performed only on standard (i.e., non-sequence) data- Parameters:
reducer
- Reducer to use
-
reduceSequence
public TransformProcess.Builder reduceSequence(IAssociativeReducer reducer)
Reduce (i.e., aggregate/combine) a set of sequence examples - for each sequence individually. Note: This method results in non-sequence data. If you would instead prefer sequences of length 1 after the reduction, usetransform(new ReduceSequenceTransform(reducer))
.- Parameters:
reducer
- Reducer to use to reduce each window
-
reduceSequenceByWindow
public TransformProcess.Builder reduceSequenceByWindow(IAssociativeReducer reducer, WindowFunction windowFunction)
Reduce (i.e., aggregate/combine) a set of sequence examples - for each sequence individually - using a window function. For example, take all records/examples in each 24-hour period (i.e., using window function), and convert them into a singe value (using the reducer). In this example, the output is a sequence, with time period of 24 hours.- Parameters:
reducer
- Reducer to use to reduce each windowwindowFunction
- Window function to find apply on each sequence individually
-
sequenceMovingWindowReduce
public TransformProcess.Builder sequenceMovingWindowReduce(String columnName, int lookback, ReduceOp op)
SequenceMovingWindowReduceTransform: Adds a new column, where the value is derived by:
(a) using a window of the last N values in a single column,
(b) Apply a reduction op on the window to calculate a new value
for example, this transformer can be used to implement a simple moving average of the last N values, or determine the minimum or maximum values in the last N time steps.For example, for a simple moving average, length 20:
new SequenceMovingWindowReduceTransform("myCol", 20, ReduceOp.Mean)
- Parameters:
columnName
- Column name to perform windowing onlookback
- Look back period for windowingop
- Reduction operation to perform on each window
-
calculateSortedRank
public TransformProcess.Builder calculateSortedRank(String newColumnName, String sortOnColumn, WritableComparator comparator)
CalculateSortedRank: calculate the rank of each example, after sorting example. For example, we might have some numerical "score" column, and we want to know for the rank (sort order) for each example, according to that column.
The rank of each example (after sorting) will be added in a new Long column. Indexing is done from 0; examples will have values 0 to dataSetSize-1.
Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data Furthermore, the current implementation can only sort on one column
- Parameters:
newColumnName
- Name of the new column (will contain the rank for each example)sortOnColumn
- Column to sort oncomparator
- Comparator used to sort examples
-
calculateSortedRank
public TransformProcess.Builder calculateSortedRank(String newColumnName, String sortOnColumn, WritableComparator comparator, boolean ascending)
CalculateSortedRank: calculate the rank of each example, after sorting example. For example, we might have some numerical "score" column, and we want to know for the rank (sort order) for each example, according to that column.
The rank of each example (after sorting) will be added in a new Long column. Indexing is done from 0; examples will have values 0 to dataSetSize-1.
Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data Furthermore, the current implementation can only sort on one column
- Parameters:
newColumnName
- Name of the new column (will contain the rank for each example)sortOnColumn
- Column to sort oncomparator
- Comparator used to sort examplesascending
- If true: sort ascending. False: descending
-
stringToCategorical
public TransformProcess.Builder stringToCategorical(String columnName, List<String> stateNames)
Convert the specified String column to a categorical column. The state names must be provided.- Parameters:
columnName
- Name of the String column to convert to categoricalstateNames
- State names of the category
-
stringRemoveWhitespaceTransform
public TransformProcess.Builder stringRemoveWhitespaceTransform(String columnName)
Remove all whitespace characters from the values in the specified String column- Parameters:
columnName
- Name of the column to remove whitespace from
-
stringMapTransform
public TransformProcess.Builder stringMapTransform(String columnName, Map<String,String> mapping)
Replace one or more String values in the specified column with new values.Keys in the map are the original values; the Values in the map are their replacements. If a String appears in the data but does not appear in the provided map (as a key), that String values will not be modified.
- Parameters:
columnName
- Name of the column in which to do replacementmapping
- Map of oldValues -> newValues
-
stringToTimeTransform
public TransformProcess.Builder stringToTimeTransform(String column, String format, org.joda.time.DateTimeZone dateTimeZone)
Convert a String column (containing a date/time String) to a time column (by parsing the date/time String)- Parameters:
column
- String column containing the date/time Stringsformat
- Format of the strings. Time format is specified as per http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.htmldateTimeZone
- Timezone of the column
-
stringToTimeTransform
public TransformProcess.Builder stringToTimeTransform(String column, String format, org.joda.time.DateTimeZone dateTimeZone, Locale locale)
Convert a String column (containing a date/time String) to a time column (by parsing the date/time String)- Parameters:
column
- String column containing the date/time Stringsformat
- Format of the strings. Time format is specified as per http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.htmldateTimeZone
- Timezone of the columnlocale
- Locale of the column
-
appendStringColumnTransform
public TransformProcess.Builder appendStringColumnTransform(String column, String toAppend)
Append a String to a specified column- Parameters:
column
- Column to append the value totoAppend
- String to append to the end of each writable
-
conditionalReplaceValueTransform
public TransformProcess.Builder conditionalReplaceValueTransform(String column, Writable newValue, Condition condition)
Replace the values in a specified column with a specified new value, if some condition holds. If the condition does not hold, the original values are not modified.- Parameters:
column
- Column to operate onnewValue
- Value to use as replacement, if condition is satisfiedcondition
- Condition that must be satisfied for replacement
-
conditionalReplaceValueTransformWithDefault
public TransformProcess.Builder conditionalReplaceValueTransformWithDefault(String column, Writable yesVal, Writable noVal, Condition condition)
Replace the values in a specified column with a specified "yes" value, if some condition holds. Replace it with a "no" value, otherwise.- Parameters:
column
- Column to operate onyesVal
- Value to use as replacement, if condition is satisfiednoVal
- Value to use as replacement, if condition is not satisfiedcondition
- Condition that must be satisfied for replacement
-
conditionalCopyValueTransform
public TransformProcess.Builder conditionalCopyValueTransform(String columnToReplace, String sourceColumn, Condition condition)
Replace the value in a specified column with a new value taken from another column, if a condition is satisfied/true.
Note that the condition can be any generic condition, including on other column(s), different to the column that will be modified if the condition is satisfied/true.- Parameters:
columnToReplace
- Name of the column in which values will be replaced (if condition is satisfied)sourceColumn
- Name of the column from which the new values will becondition
- Condition to use
-
replaceStringTransform
public TransformProcess.Builder replaceStringTransform(String columnName, Map<String,String> mapping)
Replace one or more String values in the specified column that match regular expressions.Keys in the map are the regular expressions; the Values in the map are their String replacements. For example:
Original Regex Replacement Result Data_Vec _ DataVec B1C2T3 \\d one BoneConeTone '  4.25 ' ^\\s+|\\s+$ '4.25' - Parameters:
columnName
- Name of the column in which to do replacementmapping
- Map of old values or regular expression to new values
-
ndArrayScalarOpTransform
public TransformProcess.Builder ndArrayScalarOpTransform(String columnName, MathOp op, double value)
Element-wise NDArray math operation (add, subtract, etc) on an NDArray column- Parameters:
columnName
- Name of the NDArray column to perform the operation onop
- Operation to performvalue
- Value for the operation
-
ndArrayColumnsMathOpTransform
public TransformProcess.Builder ndArrayColumnsMathOpTransform(String newColumnName, MathOp mathOp, String... columnNames)
Perform an element wise mathematical operation (such as add, subtract, multiply) on NDArray columns. The existing columns are unchanged, a new NDArray column is added- Parameters:
newColumnName
- Name of the new NDArray columnmathOp
- Operation to performcolumnNames
- Name of the columns used as input to the operation
-
ndArrayMathFunctionTransform
public TransformProcess.Builder ndArrayMathFunctionTransform(String columnName, MathFunction mathFunction)
Apply an element wise mathematical function (sin, tanh, abs etc) to an NDArray column. This operation is performed in place.- Parameters:
columnName
- Name of the column to perform the operation onmathFunction
- Mathematical function to apply
-
ndArrayDistanceTransform
public TransformProcess.Builder ndArrayDistanceTransform(String newColumnName, Distance distance, String firstCol, String secondCol)
Calculate a distance (cosine similarity, Euclidean, Manhattan) on two equal-sized NDArray columns. This operation adds a new Double column (with the specified name) with the result.- Parameters:
newColumnName
- Name of the new column (result) to adddistance
- Distance to applyfirstCol
- first column to use in the distance calculationsecondCol
- second column to use in the distance calculation
-
firstDigitTransform
public TransformProcess.Builder firstDigitTransform(String inputColumn, String outputColumn)
FirstDigitTransform converts a column to a categorical column, with values being the first digit of the number.
For example, "3.1415" becomes "3" and "2.0" becomes "2".
Negative numbers ignore the sign: "-7.123" becomes "7".
Note that twoFirstDigitTransform.Mode
s are supported, which determines how non-numerical entries should be handled:
EXCEPTION_ON_INVALID: output has 10 category values ("0", ..., "9"), and any non-numerical values result in an exception
INCLUDE_OTHER_CATEGORY: output has 11 category values ("0", ..., "9", "Other"), all non-numerical values are mapped to "Other"
FirstDigitTransform is useful (combined withCategoricalToOneHotTransform
and Reductions) to implement Benford's law.- Parameters:
inputColumn
- Input column nameoutputColumn
- Output column name. If same as input, input column is replaced
-
firstDigitTransform
public TransformProcess.Builder firstDigitTransform(String inputColumn, String outputColumn, FirstDigitTransform.Mode mode)
FirstDigitTransform converts a column to a categorical column, with values being the first digit of the number.
For example, "3.1415" becomes "3" and "2.0" becomes "2".
Negative numbers ignore the sign: "-7.123" becomes "7".
Note that twoFirstDigitTransform.Mode
s are supported, which determines how non-numerical entries should be handled:
EXCEPTION_ON_INVALID: output has 10 category values ("0", ..., "9"), and any non-numerical values result in an exception
INCLUDE_OTHER_CATEGORY: output has 11 category values ("0", ..., "9", "Other"), all non-numerical values are mapped to "Other"
FirstDigitTransform is useful (combined withCategoricalToOneHotTransform
and Reductions) to implement Benford's law.- Parameters:
inputColumn
- Input column nameoutputColumn
- Output column name. If same as input, input column is replacedmode
- SeeFirstDigitTransform.Mode
-
build
public TransformProcess build()
Create the TransformProcess object
-
-