com.salesforce.op.stages.impl.feature
uid for instance
Indicates whether to attempt language detection.
Indicates whether to attempt language detection.
Language detection threshold.
Language detection threshold. If none of the detected languages have confidence greater than the threshold then defaultLanguage is used.
Default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.
Default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.
Hashes input sequence of values into OPVector using the supplied hashing params
Hashes input sequence of values into OPVector using the supplied hashing params
HashingTF instance
HashingTF instance
Determine if the transformer should use a shared hash space for all features or not
Determine if the transformer should use a shared hash space for all features or not
true if the shared hashing space to be used, false otherwise
Minimum token length, >= 1.
Minimum token length, >= 1.
Function that prepares the input columns to be hashed Note that MurMur3 hashing algorithm only defined for primitive types so need to convert tuples to strings.
Function that prepares the input columns to be hashed Note that MurMur3 hashing algorithm only defined for primitive types so need to convert tuples to strings. MultiPickList sets are hashed as is since there is no meaningful order in the selected choices. Lists and vectors can be hashed with or without their indices, since order may be important. Maps are hashed as (key,value) strings.
element we are hashing (eg. an OPList, OPMap, etc.)
an Iterable object corresponding to the hashed element
Option to keep track of values that were missing
Option to keep track of values that were missing
Indicates whether to convert all characters to lowercase before tokenizing.
Indicates whether to convert all characters to lowercase before tokenizing.
uid for instance
uid for instance
Convert a sequence of text features into a vector by detecting categoricals that are disguised as text. A categorical will be represented as a vector consisting of occurrences of top K most common values of that feature plus occurrences of non top k values and a null indicator (if enabled). Non-categoricals will be converted into a vector using the hashing trick. In addition, a null indicator is created for each non-categorical (if enabled).