Class TextAnalyzerProperties


  • public final class TextAnalyzerProperties
    extends Object
    Author:
    Michele Rastelli
    • Constructor Detail

      • TextAnalyzerProperties

        public TextAnalyzerProperties()
    • Method Detail

      • getLocale

        public String getLocale()
        Returns:
        a locale in the format `language[_COUNTRY][.encoding][@variant]` (square brackets denote optional parts), e.g. `de.utf-8` or `en_US.utf-8`. Only UTF-8 encoding is meaningful in ArangoDB.
        See Also:
        Supported Languages
      • setLocale

        public void setLocale​(String locale)
      • isAccent

        public boolean isAccent()
        Returns:
        true to preserve accented characters (default) false to convert accented characters to their base characters
      • setAccent

        public void setAccent​(boolean accent)
      • isStemming

        public boolean isStemming()
        Returns:
        true to apply stemming on returned words (default) false to leave the tokenized words as-is
      • setStemming

        public void setStemming​(boolean stemming)
      • getEdgeNgram

        public EdgeNgram getEdgeNgram()
        Returns:
        if present, then edge n-grams are generated for each token (word). That is, the start of the n-gram is anchored to the beginning of the token, whereas the ngram Analyzer would produce all possible substrings from a single input token (within the defined length restrictions). Edge n-grams can be used to cover word-based auto-completion queries with an index, for which you should set the following other options: - accent: false - case: SearchAnalyzerCase.lower - stemming: false
      • setEdgeNgram

        public void setEdgeNgram​(EdgeNgram edgeNgram)
      • getStopwords

        public List<String> getStopwords()
        Returns:
        an array of strings with words to omit from result. Default: load words from stopwordsPath. To disable stop-word filtering provide an empty array []. If both stopwords and stopwordsPath are provided then both word sources are combined.
      • setStopwords

        public void setStopwords​(List<String> stopwords)
      • getStopwordsPath

        public String getStopwordsPath()
        Returns:
        path with a language sub-directory (e.g. en for a locale en_US.utf-8) containing files with words to omit. Each word has to be on a separate line. Everything after the first whitespace character on a line will be ignored and can be used for comments. The files can be named arbitrarily and have any file extension (or none).

        Default: if no path is provided then the value of the environment variable IRESEARCH_TEXT_STOPWORD_PATH is used to determine the path, or if it is undefined then the current working directory is assumed. If the stopwords attribute is provided then no stop-words are loaded from files, unless an explicit stopwordsPath is also provided.

        Note that if the stopwordsPath can not be accessed, is missing language sub-directories or has no files for a language required by an Analyzer, then the creation of a new Analyzer is refused. If such an issue is discovered for an existing Analyzer during startup then the server will abort with a fatal error.

      • setStopwordsPath

        public void setStopwordsPath​(String stopwordsPath)
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object