Package org.datavec.api.transform
Class TransformProcess
- java.lang.Object
-
- org.datavec.api.transform.TransformProcess
-
- All Implemented Interfaces:
Serializable
public class TransformProcess extends Object implements Serializable
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
TransformProcess.Builder
Builder class for constructing a TransformProcess
-
Constructor Summary
Constructors Constructor Description TransformProcess(Schema initialSchema, List<DataAction> actionList)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description List<Writable>
execute(List<Writable> input)
Execute the full sequence of transformations for a single example.List<List<Writable>>
executeSequence(List<List<Writable>> inputSequence)
Execute the full sequence of transformations for a single time series (sequence).List<List<Writable>>
executeSequenceToSequence(List<List<Writable>> input)
List<Writable>
executeSequenceToSingle(List<List<Writable>> inputSequence)
Execute a TransformProcess that starts with a sequence record, and converts it to a single (non-sequence) recordList<List<Writable>>
executeToSequence(List<Writable> inputExample)
Execute a TransformProcess that starts with a single (non-sequence) record, and converts it to a sequence record.List<List<List<Writable>>>
executeToSequenceBatch(List<List<Writable>> inputExample)
Execute a TransformProcess that starts with a single (non-sequence) record, and converts it to a sequence record.static TransformProcess
fromJson(String json)
Deserialize a JSON String (created bytoJson()
) to a TransformProcessstatic TransformProcess
fromYaml(String yaml)
Deserialize a JSON String (created bytoJson()
) to a TransformProcessList<DataAction>
getActionList()
Get the action list that this transform process will executeSchema
getFinalSchema()
Get the Schema of the output data, after executing the processSchema
getSchemaAfterStep(int step)
Return the schema after executing all steps up to and including the specified step.static List<String>
inferCategories(RecordReader recordReader, int columnIndex)
Infer the categories for the given record reader for a particular column Note that each "column index" is a column in the context of: Listrecord = ...; record.get(columnIndex); Note that anything passed in as a column will be automatically converted to a string for categorical purposes. static Map<Integer,List<String>>
inferCategories(RecordReader recordReader, int[] columnIndices)
Infer the categories for the given record reader for a particular set of columns (this is more efficient thaninferCategories(RecordReader, int)
if you have more than one column you plan on inferring categories for) Note that each "column index" is a column in the context of: Listrecord = ...; record.get(columnIndex); Note that anything passed in as a column will be automatically converted to a string for categorical purposes. String
toJson()
Convert the TransformProcess to a JSON stringString
toYaml()
Convert the TransformProcess to a YAML stringList<Writable>
transformRawStringsToInput(String... values)
Based on the input schema, map raw string values to the appropriate writableList<Writable>
transformRawStringsToInputList(List<String> values)
Based on the input schema, map raw string values to the appropriate writableList<List<Writable>>
transformRawStringsToInputSequence(List<List<String>> sequence)
Transforms a sequence of strings in to a sequence of writables (very similar totransformRawStringsToInput(String...)
for sequences
-
-
-
Constructor Detail
-
TransformProcess
public TransformProcess(Schema initialSchema, List<DataAction> actionList)
-
-
Method Detail
-
getActionList
public List<DataAction> getActionList()
Get the action list that this transform process will execute- Returns:
-
getFinalSchema
public Schema getFinalSchema()
Get the Schema of the output data, after executing the process- Returns:
- Schema of the output data
-
getSchemaAfterStep
public Schema getSchemaAfterStep(int step)
Return the schema after executing all steps up to and including the specified step. Steps are indexed from 0: so getSchemaAfterStep(0) is after one transform has been executed.- Parameters:
step
- Index of the step- Returns:
- Schema of the data, after that (and all prior) steps have been executed
-
execute
public List<Writable> execute(List<Writable> input)
Execute the full sequence of transformations for a single example. May return null if example is filtered NOTE: Some TransformProcess operations cannot be done on examples individually. Most notably, ConvertToSequence and ConvertFromSequence operations require the full data set to be processed at once- Parameters:
input
-- Returns:
-
executeSequenceToSequence
public List<List<Writable>> executeSequenceToSequence(List<List<Writable>> input)
- Parameters:
input
-- Returns:
-
executeSequence
public List<List<Writable>> executeSequence(List<List<Writable>> inputSequence)
Execute the full sequence of transformations for a single time series (sequence). May return null if example is filtered
-
executeToSequenceBatch
public List<List<List<Writable>>> executeToSequenceBatch(List<List<Writable>> inputExample)
Execute a TransformProcess that starts with a single (non-sequence) record, and converts it to a sequence record. NOTE: This method has the following significant limitation: if it contains a ConvertToSequence op, it MUST be using singleStepSequencesMode - seeConvertToSequence
for details.
This restriction is necessary, as ConvertToSequence.singleStepSequencesMode is false, this requires a group by operation - i.e., we need to group multiple independent records together by key(s) - this isn't possible here, when providing a single example as input- Parameters:
inputExample
- Input example- Returns:
- Sequence, after processing (or null, if it was filtered out)
-
executeToSequence
public List<List<Writable>> executeToSequence(List<Writable> inputExample)
Execute a TransformProcess that starts with a single (non-sequence) record, and converts it to a sequence record. NOTE: This method has the following significant limitation: if it contains a ConvertToSequence op, it MUST be using singleStepSequencesMode - seeConvertToSequence
for details.
This restriction is necessary, as ConvertToSequence.singleStepSequencesMode is false, this requires a group by operation - i.e., we need to group multiple independent records together by key(s) - this isn't possible here, when providing a single example as input- Parameters:
inputExample
- Input example- Returns:
- Sequence, after processing (or null, if it was filtered out)
-
executeSequenceToSingle
public List<Writable> executeSequenceToSingle(List<List<Writable>> inputSequence)
Execute a TransformProcess that starts with a sequence record, and converts it to a single (non-sequence) record- Parameters:
inputSequence
- Input sequence- Returns:
- Record after processing (or null if filtered out)
-
toJson
public String toJson()
Convert the TransformProcess to a JSON string- Returns:
- TransformProcess, as JSON
-
toYaml
public String toYaml()
Convert the TransformProcess to a YAML string- Returns:
- TransformProcess, as YAML
-
fromJson
public static TransformProcess fromJson(String json)
Deserialize a JSON String (created bytoJson()
) to a TransformProcess- Returns:
- TransformProcess, from JSON
-
fromYaml
public static TransformProcess fromYaml(String yaml)
Deserialize a JSON String (created bytoJson()
) to a TransformProcess- Returns:
- TransformProcess, from JSON
-
inferCategories
public static List<String> inferCategories(RecordReader recordReader, int columnIndex)
Infer the categories for the given record reader for a particular column Note that each "column index" is a column in the context of: Listrecord = ...; record.get(columnIndex); Note that anything passed in as a column will be automatically converted to a string for categorical purposes. The *expected* input is strings or numbers (which have sensible toString() representations) Note that the returned categories will be sorted alphabetically - Parameters:
recordReader
- the record reader to iterate throughcolumnIndex
- te column index to get categories for- Returns:
-
inferCategories
public static Map<Integer,List<String>> inferCategories(RecordReader recordReader, int[] columnIndices)
Infer the categories for the given record reader for a particular set of columns (this is more efficient thaninferCategories(RecordReader, int)
if you have more than one column you plan on inferring categories for) Note that each "column index" is a column in the context of: Listrecord = ...; record.get(columnIndex); Note that anything passed in as a column will be automatically converted to a string for categorical purposes. Results may vary depending on what's passed in. The *expected* input is strings or numbers (which have sensible toString() representations) Note that the returned categories will be sorted alphabetically, for each column - Parameters:
recordReader
- the record reader to scancolumnIndices
- the column indices the get- Returns:
- the inferred categories
-
transformRawStringsToInputSequence
public List<List<Writable>> transformRawStringsToInputSequence(List<List<String>> sequence)
Transforms a sequence of strings in to a sequence of writables (very similar totransformRawStringsToInput(String...)
for sequences- Parameters:
sequence
- the sequence to transform- Returns:
- the transformed input
-
transformRawStringsToInputList
public List<Writable> transformRawStringsToInputList(List<String> values)
Based on the input schema, map raw string values to the appropriate writable- Parameters:
values
- the values to convert- Returns:
- the transformed values based on the schema
-
-