Class OpenNlpTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public class OpenNlpTokenizer
    extends Object
    implements Tokenizer
    Tokenizer using OpenNlp
    Author:
    matskin
    • Method Detail

      • tokenize

        public Iterable<Token> tokenize​(String input,
                                        Language language,
                                        StemMode stemMode,
                                        boolean removeAccents)
        Description copied from interface: Tokenizer
        Returns the tokens produced from an input string under the rules of the given Language and additional options
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        input - the string to tokenize. May be arbitrarily large.
        language - the language of the input string.
        stemMode - the stem mode applied on the returned tokens
        removeAccents - if true accents and similar are removed from the returned tokens
        Returns:
        the tokens of the input String.