Class SegmenterImpl

  • All Implemented Interfaces:
    Segmenter

    public class SegmenterImpl
    extends Object
    implements Segmenter
    Author:
    Simon Thoresen Hult
    • Constructor Detail

      • SegmenterImpl

        public SegmenterImpl​(Tokenizer tokenizer)
    • Method Detail

      • segment

        public List<String> segment​(String input,
                                    Language language)
        Description copied from interface: Segmenter
        Split input-string into tokens, and returned a list of tokens in unprocessed form (i.e. lowercased, normalized and stemmed if applicable, see @link{StemMode} for list of stemming options). It is assumed that the input only contains word-characters, any punctuation and spacing tokens will be removed.
        Specified by:
        segment in interface Segmenter
        Parameters:
        input - the text to segment.
        language - language of input text.
        Returns:
        the list of segments.