org.opencms.search.galleries
Class CmsGallerySearchAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.ReusableAnalyzerBase
org.apache.lucene.analysis.StopwordAnalyzerBase
org.opencms.search.galleries.CmsGallerySearchAnalyzer
- All Implemented Interfaces:
- java.io.Closeable
public class CmsGallerySearchAnalyzer
- extends org.apache.lucene.analysis.StopwordAnalyzerBase
Special analyzer for multiple languages, used in the OpenCms gallery search index.
The gallery search is done in one single index that may contain multiple languages.
According to the Lucene JavaDocs (3.0 version), the Lucene StandardAnalyzer
is already using
"a good tokenizer for most European-language documents". The only caveat is that a
list of English only stop words is used.
This extended analyzer used a compound list of stop words compiled from the following languages:
- English
- German
- Spanish
- Italian
- French
- Portugese
- Danish
- Dutch
- Catalan
- Czech
- Since:
- 8.0.0
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase |
org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents |
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase |
matchVersion, stopwords |
Constructor Summary |
CmsGallerySearchAnalyzer(org.apache.lucene.util.Version version)
Constructor with version parameter. |
Method Summary |
protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents |
createComponents(java.lang.String fieldName,
java.io.Reader reader)
|
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase |
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet |
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase |
initReader, reusableTokenStream, tokenStream |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
- Default maximum allowed token length.
- See Also:
- Constant Field Values
CmsGallerySearchAnalyzer
public CmsGallerySearchAnalyzer(org.apache.lucene.util.Version version)
throws java.io.IOException
- Constructor with version parameter.
- Parameters:
version
- the Lucene standard analyzer version to match
- Throws:
java.io.IOException
createComponents
protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents(java.lang.String fieldName,
java.io.Reader reader)
- Specified by:
createComponents
in class org.apache.lucene.analysis.ReusableAnalyzerBase
- See Also:
This is take from the Lucene StandardAnalyzer, which is final since 3.1