Package org.apache.camel.spi
Interface Tokenizer.Configuration
- Enclosing interface:
Tokenizer
public static interface Tokenizer.Configuration
A nested interface representing the configuration options for this tokenizer.
Implementors of this interface can set the maximum number of tokens, the maximum overlap between tokens, and the
type of tokenization being performed.
-
Method Summary
Modifier and TypeMethodDescriptionvoid
setMaxOverlap
(int maxOverlap) Sets the maximum overlap between tokens, where an overlap is defined as the number of characters that are common between two adjacent segments.void
setMaxSegmentSize
(int maxSegmentSize) Sets the maximum size of the segment to be tokenized and produced by the tokenizer.void
setMaxTokens
(int maxTokens) Deprecated, for removal: This API element is subject to removal in a future version.void
setModelName
(String type) Sets the underlying model used by the application.void
Sets the type of tokenization being performed by this tokenizer.
-
Method Details
-
setMaxSegmentSize
void setMaxSegmentSize(int maxSegmentSize) Sets the maximum size of the segment to be tokenized and produced by the tokenizer. It can be defined either based on the number of tokens, in which case, the model name must be provided via setModelName, or the maximum number of characters.- Parameters:
maxSegmentSize
- the new maximum size of the segment- See Also:
-
setMaxTokens
Deprecated, for removal: This API element is subject to removal in a future version.Sets the maximum number of tokens to be produced by the tokenizer. Use setMaxSegmentSize instead.- Parameters:
maxTokens
- the new maximum number of tokens- See Also:
-
setMaxOverlap
void setMaxOverlap(int maxOverlap) Sets the maximum overlap between tokens, where an overlap is defined as the number of characters that are common between two adjacent segments.- Parameters:
maxOverlap
- the new maximum overlap
-
setType
Sets the type of tokenization being performed by this tokenizer. This can typically be specific to the implementation.- Parameters:
type
- the tokenization type
-
setModelName
Sets the underlying model used by the application. This can be useful when it is necessary to know in advance the cost of processing a specified text by the given model. By providing this, it effectively switches to computing the segment sizes in terms of tokens.- Parameters:
type
- the tokenization type
-