Indicates whether to attempt language detection.
Indicates whether to attempt language detection.
Language detection threshold.
Language detection threshold. If none of the detected languages have confidence greater than the threshold then defaultLanguage is used.
Default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.
Default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.
Hashes input sequence of values into OPVector using the supplied hashing params
Hashes input sequence of values into OPVector using the supplied hashing params
HashingTF instance
HashingTF instance
Determine if the transformer should use a shared hash space for all features or not
Determine if the transformer should use a shared hash space for all features or not
true if the shared hashing space to be used, false otherwise
Minimum token length, >= 1.
Minimum token length, >= 1.
Function that prepares the input columns to be hashed Note that MurMur3 hashing algorithm only defined for primitive types so need to convert tuples to strings.
Function that prepares the input columns to be hashed Note that MurMur3 hashing algorithm only defined for primitive types so need to convert tuples to strings. MultiPickList sets are hashed as is since there is no meaningful order in the selected choices. Lists and vectors can be hashed with or without their indices, since order may be important. Maps are hashed as (key,value) strings.
element we are hashing (eg. an OPList, OPMap, etc.)
an Iterable object corresponding to the hashed element
Option to keep track of text lengths
Option to keep track of text lengths
Indicates whether to convert all characters to lowercase before string operation.
Indicates whether to convert all characters to lowercase before string operation.