Class OpenNlpTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public class OpenNlpTokenizer
    extends java.lang.Object
    implements Tokenizer
    • Constructor Detail

      • OpenNlpTokenizer

        public OpenNlpTokenizer()
    • Method Detail

      • tokenize

        public java.lang.Iterable<Token> tokenize​(java.lang.String input,
                                                  Language language,
                                                  StemMode stemMode,
                                                  boolean removeAccents)
        Description copied from interface: Tokenizer
        Returns the tokens produced from an input string under the rules of the given Language and additional options
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        input - the string to tokenize. May be arbitrarily large.
        language - the language of the input string.
        stemMode - the stem mode applied on the returned tokens
        removeAccents - if true accents and similar are removed from the returned tokens
        Returns:
        the tokens of the input String.