Class TransformProcess

    • Constructor Detail

    • Method Detail

      • getActionList

        public List<DataAction> getActionList()
        Get the action list that this transform process will execute
        Returns:
      • getFinalSchema

        public Schema getFinalSchema()
        Get the Schema of the output data, after executing the process
        Returns:
        Schema of the output data
      • getSchemaAfterStep

        public Schema getSchemaAfterStep​(int step)
        Return the schema after executing all steps up to and including the specified step. Steps are indexed from 0: so getSchemaAfterStep(0) is after one transform has been executed.
        Parameters:
        step - Index of the step
        Returns:
        Schema of the data, after that (and all prior) steps have been executed
      • execute

        public List<Writable> execute​(List<Writable> input)
        Execute the full sequence of transformations for a single example. May return null if example is filtered NOTE: Some TransformProcess operations cannot be done on examples individually. Most notably, ConvertToSequence and ConvertFromSequence operations require the full data set to be processed at once
        Parameters:
        input -
        Returns:
      • executeSequence

        public List<List<Writable>> executeSequence​(List<List<Writable>> inputSequence)
        Execute the full sequence of transformations for a single time series (sequence). May return null if example is filtered
      • executeToSequenceBatch

        public List<List<List<Writable>>> executeToSequenceBatch​(List<List<Writable>> inputExample)
        Execute a TransformProcess that starts with a single (non-sequence) record, and converts it to a sequence record. NOTE: This method has the following significant limitation: if it contains a ConvertToSequence op, it MUST be using singleStepSequencesMode - see ConvertToSequence for details.
        This restriction is necessary, as ConvertToSequence.singleStepSequencesMode is false, this requires a group by operation - i.e., we need to group multiple independent records together by key(s) - this isn't possible here, when providing a single example as input
        Parameters:
        inputExample - Input example
        Returns:
        Sequence, after processing (or null, if it was filtered out)
      • executeToSequence

        public List<List<Writable>> executeToSequence​(List<Writable> inputExample)
        Execute a TransformProcess that starts with a single (non-sequence) record, and converts it to a sequence record. NOTE: This method has the following significant limitation: if it contains a ConvertToSequence op, it MUST be using singleStepSequencesMode - see ConvertToSequence for details.
        This restriction is necessary, as ConvertToSequence.singleStepSequencesMode is false, this requires a group by operation - i.e., we need to group multiple independent records together by key(s) - this isn't possible here, when providing a single example as input
        Parameters:
        inputExample - Input example
        Returns:
        Sequence, after processing (or null, if it was filtered out)
      • executeSequenceToSingle

        public List<Writable> executeSequenceToSingle​(List<List<Writable>> inputSequence)
        Execute a TransformProcess that starts with a sequence record, and converts it to a single (non-sequence) record
        Parameters:
        inputSequence - Input sequence
        Returns:
        Record after processing (or null if filtered out)
      • toJson

        public String toJson()
        Convert the TransformProcess to a JSON string
        Returns:
        TransformProcess, as JSON
      • toYaml

        public String toYaml()
        Convert the TransformProcess to a YAML string
        Returns:
        TransformProcess, as YAML
      • fromJson

        public static TransformProcess fromJson​(String json)
        Deserialize a JSON String (created by toJson()) to a TransformProcess
        Returns:
        TransformProcess, from JSON
      • fromYaml

        public static TransformProcess fromYaml​(String yaml)
        Deserialize a JSON String (created by toJson()) to a TransformProcess
        Returns:
        TransformProcess, from JSON
      • inferCategories

        public static List<String> inferCategories​(RecordReader recordReader,
                                                   int columnIndex)
        Infer the categories for the given record reader for a particular column Note that each "column index" is a column in the context of: List record = ...; record.get(columnIndex); Note that anything passed in as a column will be automatically converted to a string for categorical purposes. The *expected* input is strings or numbers (which have sensible toString() representations) Note that the returned categories will be sorted alphabetically
        Parameters:
        recordReader - the record reader to iterate through
        columnIndex - te column index to get categories for
        Returns:
      • inferCategories

        public static Map<Integer,​List<String>> inferCategories​(RecordReader recordReader,
                                                                      int[] columnIndices)
        Infer the categories for the given record reader for a particular set of columns (this is more efficient than inferCategories(RecordReader, int) if you have more than one column you plan on inferring categories for) Note that each "column index" is a column in the context of: List record = ...; record.get(columnIndex); Note that anything passed in as a column will be automatically converted to a string for categorical purposes. Results may vary depending on what's passed in. The *expected* input is strings or numbers (which have sensible toString() representations) Note that the returned categories will be sorted alphabetically, for each column
        Parameters:
        recordReader - the record reader to scan
        columnIndices - the column indices the get
        Returns:
        the inferred categories
      • transformRawStringsToInputSequence

        public List<List<Writable>> transformRawStringsToInputSequence​(List<List<String>> sequence)
        Transforms a sequence of strings in to a sequence of writables (very similar to transformRawStringsToInput(String...) for sequences
        Parameters:
        sequence - the sequence to transform
        Returns:
        the transformed input
      • transformRawStringsToInputList

        public List<Writable> transformRawStringsToInputList​(List<String> values)
        Based on the input schema, map raw string values to the appropriate writable
        Parameters:
        values - the values to convert
        Returns:
        the transformed values based on the schema
      • transformRawStringsToInput

        public List<Writable> transformRawStringsToInput​(String... values)
        Based on the input schema, map raw string values to the appropriate writable
        Parameters:
        values - the values to convert
        Returns:
        the transformed values based on the schema