Package com.yahoo.language.process
Interface Segmenter
-
- All Known Implementing Classes:
SegmenterImpl
public interface Segmenter
Interface providing segmentation, i.e. splitting of CJK character blocks into separate tokens. This is primarily a convenience feature for users who don't need full tokenization (or who use a separate tokenizer and only need CJK processing).- Author:
- Mathias Mølster Lidal
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description java.util.List<java.lang.String>
segment(java.lang.String input, Language language)
Split input-string into tokens, and returned a list of tokens in unprocessed form (i.e.
-
-
-
Method Detail
-
segment
java.util.List<java.lang.String> segment(java.lang.String input, Language language)
Split input-string into tokens, and returned a list of tokens in unprocessed form (i.e. lowercased, normalized and stemmed if applicable, see @link{StemMode} for list of stemming options). It is assumed that the input only contains word-characters, any punctuation and spacing tokens will be removed.- Parameters:
input
- the text to segment.language
- language of input text.- Returns:
- the list of segments.
- Throws:
ProcessingException
- if an exception is encountered during processing
-
-