Class TextAnalyzerProperties
- java.lang.Object
-
- com.arangodb.entity.arangosearch.analyzer.TextAnalyzerProperties
-
public final class TextAnalyzerProperties extends Object
- Author:
- Michele Rastelli
-
-
Constructor Summary
Constructors Constructor Description TextAnalyzerProperties()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object o)
SearchAnalyzerCase
getAnalyzerCase()
EdgeNgram
getEdgeNgram()
String
getLocale()
List<String>
getStopwords()
String
getStopwordsPath()
int
hashCode()
boolean
isAccent()
boolean
isStemming()
void
setAccent(boolean accent)
void
setAnalyzerCase(SearchAnalyzerCase analyzerCase)
void
setEdgeNgram(EdgeNgram edgeNgram)
void
setLocale(String locale)
void
setStemming(boolean stemming)
void
setStopwords(List<String> stopwords)
void
setStopwordsPath(String stopwordsPath)
-
-
-
Method Detail
-
getLocale
public String getLocale()
- Returns:
- a locale in the format `language[_COUNTRY][.encoding][@variant]` (square brackets denote optional parts), e.g. `de.utf-8` or `en_US.utf-8`. Only UTF-8 encoding is meaningful in ArangoDB.
- See Also:
- Supported Languages
-
setLocale
public void setLocale(String locale)
-
isAccent
public boolean isAccent()
- Returns:
true
to preserve accented characters (default)false
to convert accented characters to their base characters
-
setAccent
public void setAccent(boolean accent)
-
getAnalyzerCase
public SearchAnalyzerCase getAnalyzerCase()
-
setAnalyzerCase
public void setAnalyzerCase(SearchAnalyzerCase analyzerCase)
- Parameters:
analyzerCase
- defaults toSearchAnalyzerCase.lower
-
isStemming
public boolean isStemming()
- Returns:
true
to apply stemming on returned words (default)false
to leave the tokenized words as-is
-
setStemming
public void setStemming(boolean stemming)
-
getEdgeNgram
public EdgeNgram getEdgeNgram()
- Returns:
- if present, then edge n-grams are generated for each token (word). That is, the start of the n-gram is
anchored to the beginning of the token, whereas the ngram Analyzer would produce all possible substrings from a
single input token (within the defined length restrictions). Edge n-grams can be used to cover word-based
auto-completion queries with an index, for which you should set the following other options:
- accent: false
- case:
SearchAnalyzerCase.lower
- stemming: false
-
setEdgeNgram
public void setEdgeNgram(EdgeNgram edgeNgram)
-
getStopwords
public List<String> getStopwords()
- Returns:
- an array of strings with words to omit from result. Default: load words from stopwordsPath. To disable stop-word filtering provide an empty array []. If both stopwords and stopwordsPath are provided then both word sources are combined.
-
getStopwordsPath
public String getStopwordsPath()
- Returns:
- path with a language sub-directory (e.g. en for a locale en_US.utf-8) containing files with words to
omit.
Each word has to be on a separate line. Everything after the first whitespace character on a line will be ignored
and can be used for comments. The files can be named arbitrarily and have any file extension (or none).
Default: if no path is provided then the value of the environment variable IRESEARCH_TEXT_STOPWORD_PATH is used to determine the path, or if it is undefined then the current working directory is assumed. If the stopwords attribute is provided then no stop-words are loaded from files, unless an explicit stopwordsPath is also provided.
Note that if the stopwordsPath can not be accessed, is missing language sub-directories or has no files for a language required by an Analyzer, then the creation of a new Analyzer is refused. If such an issue is discovered for an existing Analyzer during startup then the server will abort with a fatal error.
-
setStopwordsPath
public void setStopwordsPath(String stopwordsPath)
-
-