Interface TextTokenizer

    • Method Detail

      • getLanguage

        Languages getLanguage()
        Gets the language for the tokenizer.
        Returns:
        the language for this tokenizer.
      • tokenize

        Set<String> tokenize​(String text)
        Tokenize a text and discards all stop-words from it.
        Parameters:
        text - the text to tokenize
        Returns:
        the set of tokens.
      • stopWords

        Set<String> stopWords()
        Gets all stop-words for a language.
        Returns:
        the set of all stop-words.