Interface Token


  • public interface Token
    A single token produced by the tokenizer.
    Author:
    Mathias Mølster Lidal
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      Token getComponent​(int i)
      Returns a component token of this
      int getNumComponents()
      Returns the number of components, if this token is a compound word (e.g.
      int getNumStems()
      Returns the number of stem forms available for this token.
      long getOffset()
      Returns the offset position of this token
      java.lang.String getOrig()
      Returns the original form of this token
      TokenScript getScript()
      Returns the script of this token
      java.lang.String getStem​(int i)
      Returns the stem at position i
      java.lang.String getTokenString()
      Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word.
      TokenType getType()
      Returns the type of this token - word, space or punctuation etc.
      boolean isIndexable()
      Whether this token should be indexed
      boolean isSpecialToken()
      Returns whether this is an instance of a declared special token (e.g.
    • Method Detail

      • getType

        TokenType getType()
        Returns the type of this token - word, space or punctuation etc.
      • getOrig

        java.lang.String getOrig()
        Returns the original form of this token
      • getNumStems

        int getNumStems()
        Returns the number of stem forms available for this token.
      • getStem

        java.lang.String getStem​(int i)
        Returns the stem at position i
      • getNumComponents

        int getNumComponents()
        Returns the number of components, if this token is a compound word (e.g. german "kommunikationsfehler". Otherwise, return 0
        Returns:
        number of components, or 0 if none
      • getComponent

        Token getComponent​(int i)
        Returns a component token of this
      • getOffset

        long getOffset()
        Returns the offset position of this token
      • getScript

        TokenScript getScript()
        Returns the script of this token
      • getTokenString

        java.lang.String getTokenString()
        Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word. If this is a special token with a configured replacement, this will return the replacement token.
      • isSpecialToken

        boolean isSpecialToken()
        Returns whether this is an instance of a declared special token (e.g. c++)
      • isIndexable

        boolean isIndexable()
        Whether this token should be indexed