public static class TransformProcess.Builder extends Object
Modifier and Type | Method and Description |
---|---|
TransformProcess.Builder |
appendStringColumnTransform(String column,
String toAppend)
Append a String to a specified column
|
TransformProcess |
build()
Create the TransformProcess object
|
TransformProcess.Builder |
calculateSortedRank(String newColumnName,
String sortOnColumn,
WritableComparator comparator)
CalculateSortedRank: calculate the rank of each example, after sorting example.
|
TransformProcess.Builder |
calculateSortedRank(String newColumnName,
String sortOnColumn,
WritableComparator comparator,
boolean ascending)
CalculateSortedRank: calculate the rank of each example, after sorting example.
|
TransformProcess.Builder |
categoricalToInteger(String... columnNames)
Convert the specified column(s) from a categorical representation to an integer representation.
|
TransformProcess.Builder |
categoricalToOneHot(String... columnNames)
Convert the specified column(s) from a categorical representation to a one-hot representation.
|
TransformProcess.Builder |
conditionalCopyValueTransform(String columnToReplace,
String sourceColumn,
Condition condition)
Replace the value in a specified column with a new value taken from another column, if a condition is satisfied/true.
Note that the condition can be any generic condition, including on other column(s), different to the column that will be modified if the condition is satisfied/true. |
TransformProcess.Builder |
conditionalReplaceValueTransform(String column,
Writable newValue,
Condition condition)
Replace the values in a specified column with a specified new value, if some condition holds.
|
TransformProcess.Builder |
convertFromSequence()
Convert a sequence to a set of individual values (by treating each value in each sequence as a separate example)
|
TransformProcess.Builder |
convertToSequence(String keyColumn,
SequenceComparator comparator)
Convert a set of independent records/examples into a sequence, according to some key.
|
TransformProcess.Builder |
doubleColumnsMathOp(String newColumnName,
MathOp mathOp,
String... columnNames)
Calculate and add a new double column by performing a mathematical operation on a number of existing columns.
|
TransformProcess.Builder |
doubleMathOp(String columnName,
MathOp mathOp,
double scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified double column, with a scalar
|
TransformProcess.Builder |
duplicateColumn(String column,
String newName)
Duplicate a single column
|
TransformProcess.Builder |
duplicateColumns(List<String> columnNames,
List<String> newNames)
Duplicate a set of columns
|
TransformProcess.Builder |
filter(Filter filter)
Add a filter operation to be executed after the previously-added operations have been executed
|
TransformProcess.Builder |
integerColumnsMathOp(String newColumnName,
MathOp mathOp,
String... columnNames)
Calculate and add a new integer column by performing a mathematical operation on a number of existing columns.
|
TransformProcess.Builder |
integerMathOp(String column,
MathOp mathOp,
int scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified integer column, with a scalar
|
TransformProcess.Builder |
integerToCategorical(String columnName,
List<String> categoryStateNames)
Convert the specified column from an integer representation (assume values 0 to numCategories-1) to
a categorical representation, given the specified state names
|
TransformProcess.Builder |
integerToCategorical(String columnName,
Map<Integer,String> categoryIndexNameMap)
Convert the specified column from an integer representation to a categorical representation, given the specified
mapping between integer indexes and state names
|
TransformProcess.Builder |
longColumnsMathOp(String newColumnName,
MathOp mathOp,
String... columnNames)
Calculate and add a new long column by performing a mathematical operation on a number of existing columns.
|
TransformProcess.Builder |
longMathOp(String columnName,
MathOp mathOp,
long scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified long column, with a scalar
|
TransformProcess.Builder |
normalize(String column,
Normalize type,
DataAnalysis da)
Normalize the specified column with a given type of normalization
|
TransformProcess.Builder |
reduce(IReducer reducer)
Reduce (i.e., aggregate/combine) a set of examples (typically by key).
|
TransformProcess.Builder |
reduceSequenceByWindow(IReducer reducer,
WindowFunction windowFunction)
Reduce (i.e., aggregate/combine) a set of sequence examples - for each sequence individually - using a window function.
|
TransformProcess.Builder |
removeAllColumnsExceptFor(Collection<String> columnNames)
Remove all columns, except for those that are specified here
|
TransformProcess.Builder |
removeAllColumnsExceptFor(String... columnNames)
Remove all columns, except for those that are specified here
|
TransformProcess.Builder |
removeColumns(Collection<String> columnNames)
Remove all of the specified columns, by name
|
TransformProcess.Builder |
removeColumns(String... columnNames)
Remove all of the specified columns, by name
|
TransformProcess.Builder |
renameColumn(String oldName,
String newName)
Rename a single column
|
TransformProcess.Builder |
renameColumns(List<String> oldNames,
List<String> newNames)
Rename multiple columns
|
TransformProcess.Builder |
reorderColumns(String... newOrder)
Reorder the columns using a partial or complete new ordering.
|
TransformProcess.Builder |
splitSequence(SequenceSplit split)
Split sequences into 1 or more other sequences.
|
TransformProcess.Builder |
stringMapTransform(String columnName,
Map<String,String> mapping)
Replace one or more String values in the specified column with new values.
|
TransformProcess.Builder |
stringRemoveWhitespaceTransform(String columnName)
Remove all whitespace characters from the values in the specified String column
|
TransformProcess.Builder |
stringToCategorical(String columnName,
List<String> stateNames)
Convert the specified String column to a categorical column.
|
TransformProcess.Builder |
stringToTimeTransform(String column,
String format,
org.joda.time.DateTimeZone dateTimeZone)
Convert a String column (containing a date/time String) to a time column (by parsing the date/time String)
|
TransformProcess.Builder |
timeMathOp(String columnName,
MathOp mathOp,
long timeQuantity,
TimeUnit timeUnit)
Perform a mathematical operation (add, subtract, scalar min/max only) on the specified time column
|
TransformProcess.Builder |
transform(Transform transform)
Add a transformation to be executed after the previously-added operations have been executed
|
public Builder(Schema initialSchema)
public TransformProcess.Builder transform(Transform transform)
transform
- Transform to executepublic TransformProcess.Builder filter(Filter filter)
filter
- Filter operation to executepublic TransformProcess.Builder removeColumns(String... columnNames)
columnNames
- Names of the columns to removepublic TransformProcess.Builder removeColumns(Collection<String> columnNames)
columnNames
- Names of the columns to removepublic TransformProcess.Builder removeAllColumnsExceptFor(String... columnNames)
columnNames
- Names of the columns to keeppublic TransformProcess.Builder removeAllColumnsExceptFor(Collection<String> columnNames)
columnNames
- Names of the columns to keeppublic TransformProcess.Builder renameColumn(String oldName, String newName)
oldName
- Original column namenewName
- New column namepublic TransformProcess.Builder renameColumns(List<String> oldNames, List<String> newNames)
oldNames
- List of original column namesnewNames
- List of new column namespublic TransformProcess.Builder reorderColumns(String... newOrder)
newOrder
- Names of the columns, in the order they will appear in the outputpublic TransformProcess.Builder duplicateColumn(String column, String newName)
column
- Name of the column to duplicatenewName
- Name of the new (duplicate) columnpublic TransformProcess.Builder duplicateColumns(List<String> columnNames, List<String> newNames)
columnNames
- Names of the columns to duplicatenewNames
- Names of the new (duplicated) columnspublic TransformProcess.Builder integerMathOp(String column, MathOp mathOp, int scalar)
column
- The integer column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operationpublic TransformProcess.Builder integerColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operationpublic TransformProcess.Builder longMathOp(String columnName, MathOp mathOp, long scalar)
columnName
- The long column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operationpublic TransformProcess.Builder longColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operationpublic TransformProcess.Builder doubleMathOp(String columnName, MathOp mathOp, double scalar)
columnName
- The double column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operationpublic TransformProcess.Builder doubleColumnsMathOp(String newColumnName, MathOp mathOp, String... columnNames)
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operationpublic TransformProcess.Builder timeMathOp(String columnName, MathOp mathOp, long timeQuantity, TimeUnit timeUnit)
columnName
- The integer column to perform the operation onmathOp
- The mathematical operationtimeQuantity
- The quantity used in the mathematical optimeUnit
- The unit that timeQuantity is specified inpublic TransformProcess.Builder categoricalToOneHot(String... columnNames)
columnNames
- Names of the categorical column(s) to convert to a one-hot representationpublic TransformProcess.Builder categoricalToInteger(String... columnNames)
columnNames
- Name of the categorical column(s) to convert to an integer representationpublic TransformProcess.Builder integerToCategorical(String columnName, List<String> categoryStateNames)
columnName
- Name of the column to convertcategoryStateNames
- Names of the states for the categorical columnpublic TransformProcess.Builder integerToCategorical(String columnName, Map<Integer,String> categoryIndexNameMap)
columnName
- Name of the column to convertcategoryIndexNameMap
- Names of the states for the categorical columnpublic TransformProcess.Builder normalize(String column, Normalize type, DataAnalysis da)
column
- Column to normalizetype
- Type of normalization to applyda
- DataAnalysis objectpublic TransformProcess.Builder convertToSequence(String keyColumn, SequenceComparator comparator)
SequenceComparator
keyColumn
- Column to use as a key (values with the same key will be combined into sequences)comparator
- A SequenceComparator to order the values within each sequence (for example, by time or String order)public TransformProcess.Builder convertFromSequence()
public TransformProcess.Builder splitSequence(SequenceSplit split)
split
- SequenceSplit that defines how splits will occurpublic TransformProcess.Builder reduce(IReducer reducer)
reducer
- Reducer to usepublic TransformProcess.Builder reduceSequenceByWindow(IReducer reducer, WindowFunction windowFunction)
reducer
- Reducer to use to reduce each windowwindowFunction
- Window function to find apply on each sequence individuallypublic TransformProcess.Builder calculateSortedRank(String newColumnName, String sortOnColumn, WritableComparator comparator)
Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data Furthermore, the current implementation can only sort on one column
newColumnName
- Name of the new column (will contain the rank for each example)sortOnColumn
- Column to sort oncomparator
- Comparator used to sort examplespublic TransformProcess.Builder calculateSortedRank(String newColumnName, String sortOnColumn, WritableComparator comparator, boolean ascending)
Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data Furthermore, the current implementation can only sort on one column
newColumnName
- Name of the new column (will contain the rank for each example)sortOnColumn
- Column to sort oncomparator
- Comparator used to sort examplesascending
- If true: sort ascending. False: descendingpublic TransformProcess.Builder stringToCategorical(String columnName, List<String> stateNames)
columnName
- Name of the String column to convert to categoricalstateNames
- State names of the categorypublic TransformProcess.Builder stringRemoveWhitespaceTransform(String columnName)
columnName
- Name of the column to remove whitespace frompublic TransformProcess.Builder stringMapTransform(String columnName, Map<String,String> mapping)
Keys in the map are the original values; the Values in the map are their replacements. If a String appears in the data but does not appear in the provided map (as a key), that String values will not be modified.
columnName
- Name of the column in which to do replacementmapping
- Map of oldValues -> newValuespublic TransformProcess.Builder stringToTimeTransform(String column, String format, org.joda.time.DateTimeZone dateTimeZone)
column
- String column containing the date/time Stringsformat
- Format of the strings. Time format is specified as per http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.htmldateTimeZone
- Timezone of the columnpublic TransformProcess.Builder appendStringColumnTransform(String column, String toAppend)
column
- Column to append the value totoAppend
- String to append to the end of each writablepublic TransformProcess.Builder conditionalReplaceValueTransform(String column, Writable newValue, Condition condition)
column
- Column to operate onnewValue
- Value to use as replacement, if condition is satisfiedcondition
- Condition that must be satisfied for replacementpublic TransformProcess.Builder conditionalCopyValueTransform(String columnToReplace, String sourceColumn, Condition condition)
columnToReplace
- Name of the column in which values will be replaced (if condition is satisfied)sourceColumn
- Name of the column from which the new values will becondition
- Condition to usepublic TransformProcess build()
Copyright © 2016. All rights reserved.