org.opencms.search.galleries
Class CmsGallerySearchAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.ReusableAnalyzerBase
          extended by org.apache.lucene.analysis.StopwordAnalyzerBase
              extended by org.opencms.search.galleries.CmsGallerySearchAnalyzer
All Implemented Interfaces:
Closeable

public class CmsGallerySearchAnalyzer
extends org.apache.lucene.analysis.StopwordAnalyzerBase

Special analyzer for multiple languages, used in the OpenCms gallery search index.

The gallery search is done in one single index that may contain multiple languages.

According to the Lucene JavaDocs (3.0 version), the Lucene StandardAnalyzer is already using "a good tokenizer for most European-language documents". The only caveat is that a list of English only stop words is used.

This extended analyzer used a compound list of stop words compiled from the following languages:

Since:
8.0.0

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents
 
Field Summary
static int DEFAULT_MAX_TOKEN_LENGTH
          Default maximum allowed token length
 
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
matchVersion, stopwords
 
Constructor Summary
CmsGallerySearchAnalyzer(org.apache.lucene.util.Version version)
          Constructor with version parameter.
 
Method Summary
protected  org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName, Reader reader)
           
 
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet
 
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
initReader, reusableTokenStream, tokenStream
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MAX_TOKEN_LENGTH

public static final int DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length

See Also:
Constant Field Values
Constructor Detail

CmsGallerySearchAnalyzer

public CmsGallerySearchAnalyzer(org.apache.lucene.util.Version version)
                         throws IOException
Constructor with version parameter.

Parameters:
version - the Lucene standard analyzer version to match
Throws:
IOException
Method Detail

createComponents

protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName,
                                                                                                 Reader reader)
Specified by:
createComponents in class org.apache.lucene.analysis.ReusableAnalyzerBase
See Also:
This is take from the Lucene StandardAnalyzer, which is final since 3.1