org.opencms.search
Class CmsSearchIndex

java.lang.Object
  extended by org.opencms.search.CmsSearchIndex
All Implemented Interfaces:
I_CmsConfigurationParameterHandler
Direct Known Subclasses:
CmsGallerySearchIndex

public class CmsSearchIndex
extends Object
implements I_CmsConfigurationParameterHandler

Implements the search within an index and the management of the index configuration.

Since:
6.0.0

Nested Class Summary
protected  class CmsSearchIndex.LazyContentReader
          Lucene filter index reader implementation that will ensure the OpenCms default search index fields CmsSearchField.FIELD_CONTENT and CmsSearchField.FIELD_CONTENT_BLOB are lazy loaded.
 
Field Summary
static String BACKUP_REINDEXING
          Constant for additional parameter to enable optimized full index regeneration (default: false).
protected static org.apache.lucene.document.FieldSelector CONTENT_SELECTOR
          Field selector for Lucene that that will ensure the OpenCms default search index fields CmsSearchField.FIELD_CONTENT and CmsSearchField.FIELD_CONTENT_BLOB are lazy loaded.
static String[] DATES
          Look table to quickly zero-pad days / months in date Strings.
static String[] DOC_META_FIELDS
          Constant for a field list that contains the "meta" field as well as the "content" field.
static String EXCERPT
          Constant for additional parameter to enable excerpt creation (default: true).
static String EXTRACT_CONTENT
          Constant for additional parameter for index content extraction.
static String LUCENE_AUTO_COMMIT
          Constant for additional parameter for the Lucene index setting.
static String LUCENE_MAX_MERGE_DOCS
          Constant for additional parameter for the Lucene index setting.
static String LUCENE_MERGE_FACTOR
          Constant for additional parameter for the Lucene index setting.
static String LUCENE_RAM_BUFFER_SIZE_MB
          Constant for additional parameter for the Lucene index setting.
static String LUCENE_USE_COMPOUND_FILE
          Constant for additional parameter for the Lucene index setting.
static org.apache.lucene.util.Version LUCENE_VERSION
          The Lucene Version used to create Query parsers and such.
protected  boolean m_requireViewPermission
          Controls if a resource requires view permission to be displayed in the result list.
protected  List<CmsSearchIndexSource> m_sources
          The list of configured index sources.
static String MAX_HITS
          Constant for additional parameter for controlling how many hits are loaded at maximum (default: 1000).
static int MAX_HITS_DEFAULT
          Indicates how many hits are loaded at maximum by default.
static int MAX_YEAR_RANGE
          Constant for years max range span in document search.
static String PERMISSIONS
          Constant for additional parameter to enable permission checks (default: true).
static String PRIORITY
          Constant for additional parameter to set the thread priority during search.
static String PROPERTY_SEARCH_EXCLUDE_VALUE_ALL
          Special value for the search.exclude property.
static String PROPERTY_SEARCH_EXCLUDE_VALUE_GALLERY
          Special value for the search.exclude property.
static String REBUILD_MODE_AUTO
          Automatic ("auto") index rebuild mode.
static String REBUILD_MODE_MANUAL
          Manual ("manual") index rebuild mode.
static String REBUILD_MODE_OFFLINE
          Offline ("offline") index rebuild mode.
static String TIME_RANGE
          Constant for additional parameter to enable time range checks (default: true).
static String USE_ALL_LOCALE
          The use all locale.
 
Fields inherited from interface org.opencms.configuration.I_CmsConfigurationParameterHandler
ADD_PARAMETER_METHOD, INIT_CONFIGURATION_METHOD
 
Constructor Summary
CmsSearchIndex()
          Default constructor only intended to be used by the XML configuration.
CmsSearchIndex(String name)
          Creates a new CmsSearchIndex with the given name.
 
Method Summary
 void addConfigurationParameter(String key, String value)
          Adds a parameter.
 void addSourceName(String sourceName)
          Adds am index source to this search index.
protected  org.apache.lucene.search.BooleanFilter appendCategoryFilter(CmsObject cms, org.apache.lucene.search.BooleanFilter filter, List<String> categories)
          Appends the a category filter to the given filter clause that matches all given categories.
protected  org.apache.lucene.search.BooleanFilter appendDateCreatedFilter(org.apache.lucene.search.BooleanFilter filter, long startTime, long endTime)
          Appends a date of creation filter to the given filter clause that matches the given time range.
protected  org.apache.lucene.search.BooleanFilter appendDateLastModifiedFilter(org.apache.lucene.search.BooleanFilter filter, long startTime, long endTime)
          Appends a date of last modification filter to the given filter clause that matches the given time range.
protected  org.apache.lucene.search.BooleanFilter appendPathFilter(CmsObject cms, org.apache.lucene.search.BooleanFilter filter, List<String> roots)
          Appends the a VFS path filter to the given filter clause that matches all given root paths.
protected  org.apache.lucene.search.BooleanFilter appendResourceTypeFilter(CmsObject cms, org.apache.lucene.search.BooleanFilter filter, List<String> resourceTypes)
          Appends the a resource type filter to the given filter clause that matches all given resource types.
 boolean checkConfiguration(CmsObject cms)
          Checks is this index has been configured correctly.
protected  org.apache.lucene.search.Filter createDateRangeFilter(String fieldName, long startTime, long endTime)
          Creates an optimized date range filter for the date of last modification or creation.
protected  String createIndexBackup()
          Creates a backup of this index for optimized re-indexing of the whole content.
 boolean equals(Object obj)
           
protected  boolean excludeFromIndex(CmsObject cms, CmsResource resource)
          Checks if the provided resource should be excluded from this search index.
protected  void extendPathFilter(org.apache.lucene.search.TermsFilter pathFilter, String searchRoot)
          Extends the given path query with another term for the given search root element.
 org.apache.lucene.analysis.Analyzer getAnalyzer()
          Returns the Lucene analyzer used for this index.
 CmsParameterConfiguration getConfiguration()
          Returns the parameters of this configurable class instance, or null if the class does not need any parameters.
static List<String> getDateRangeSpan(long startDate, long endDate)
          Generates a list of date terms for the optimized date range search with "daily" granularity level.
 org.apache.lucene.document.Document getDocument(String rootPath)
          Returns the Lucene document with the given root path from the index.
 I_CmsDocumentFactory getDocumentFactory(CmsResource res)
          Returns the document type factory used for the given resource in this index, or null in case the resource is not indexed by this index.
 CmsSearchFieldConfiguration getFieldConfiguration()
          Returns the search field configuration of this index.
 String getFieldConfigurationName()
          Returns the name of the field configuration used for this index.
 I_CmsIndexWriter getIndexWriter(I_CmsReport report, boolean create)
          Returns a new index writer for this index.
 Locale getLocale()
          Returns the language locale of this index.
 Locale getLocaleForResource(CmsObject cms, CmsResource resource, List<Locale> availableLocales)
          Returns the language locale for the given resource in this index.
 String getLocaleString()
          Returns the language locale of the index as a String.
 int getMaxHits()
          Indicates the number of how many hits are loaded at maximum.
protected  org.apache.lucene.search.Filter getMultiTermQueryFilter(String field, List<String> terms)
          Returns a cached Lucene term query filter for the given field and terms.
protected  org.apache.lucene.search.Filter getMultiTermQueryFilter(String field, String terms)
          Returns a cached Lucene term query filter for the given field and terms.
protected  org.apache.lucene.search.Filter getMultiTermQueryFilter(String field, String termsStr, List<String> termsList)
          Returns a cached Lucene term query filter for the given field and terms.
 String getName()
          Gets the name of this index.
 String getPath()
          Returns the path where this index stores it's data in the "real" file system.
 int getPriority()
          Returns the Thread priority for this search index.
 String getProject()
          Gets the project of this index.
 String getRebuildMode()
          Get the rebuild mode of this index.
 org.apache.lucene.search.IndexSearcher getSearcher()
          Returns the Lucene index searcher used for this search index.
 List<String> getSourceNames()
          Returns all configured sources names of this search index.
 List<CmsSearchIndexSource> getSources()
          Returns all configured index sources of this search index.
protected  org.apache.lucene.search.Filter getTermQueryFilter(String field, String term)
          Returns a cached Lucene term query filter for the given field and term.
 int hashCode()
           
protected  boolean hasReadPermission(CmsObject cms, org.apache.lucene.document.Document doc)
          Checks if the OpenCms resource referenced by the result document can be read be the user of the given OpenCms context.
protected  void indexSearcherClose()
          Closes the Lucene index searcher for this index.
protected  void indexSearcherClose(org.apache.lucene.search.IndexSearcher searcher)
          Closes the given Lucene index searcher.
protected  void indexSearcherOpen(String path)
          Initializes the Lucene index searcher for this index.
protected  void indexSearcherUpdate()
          Reopens the Lucene index search reader for this index, required after the index has been changed.
protected  I_CmsIndexWriter indexWriterCreate(boolean create)
          Creates a new index writer.
protected  void indexWriterUnlock(I_CmsReport report)
          Unlocks the Lucene index writer of this index if required.
 void initConfiguration()
          Initializes a configuration after all parameters have been added.
 void initialize()
          Initializes the search index.
 boolean isBackupReindexing()
          Returns true if backup re-indexing is done by this index.
 boolean isCheckingPermissions()
          Returns true if permissions are checked for search results by this index.
 boolean isCheckingTimeRange()
          Returns true if the document time range is checked with a granularity level of seconds for search results by this index.
 boolean isCreatingExcerpt()
          Returns true if an excerpt is generated by this index.
 boolean isEnabled()
          Returns true if this index is currently disabled.
 boolean isExtractingContent()
          Returns true if full text is extracted by this index.
protected  boolean isInTimeRange(org.apache.lucene.document.Document doc, CmsSearchParameters params)
          Checks if the document is in the time range specified in the search parameters.
 boolean isRequireViewPermission()
          Returns true if a resource requires read permission to be incuded in the result list.
protected  void prepareSortScoring(org.apache.lucene.search.IndexSearcher searcher, org.apache.lucene.search.Sort sort)
          Checks if the score for the results must be calculated based on the provided sort option.
protected  void removeIndexBackup(String path)
          Removes the given backup folder of this index.
 void removeSourceName(String sourceName)
          Removes an index source from this search index.
 CmsSearchResultList search(CmsObject cms, CmsSearchParameters params)
          Performs a search on the index within the given fields.
 void setAnalyzer(org.apache.lucene.analysis.Analyzer analyzer)
          Sets the Lucene analyzer used for this index.
 void setEnabled(boolean enabled)
          Can be used to enable / disable this index.
 void setFieldConfiguration(CmsSearchFieldConfiguration fieldConfiguration)
          Sets the field configuration used for this index.
 void setFieldConfigurationName(String fieldConfigurationName)
          Sets the name of the field configuration used for this index.
 void setLocale(Locale locale)
          Sets the locale to index resources.
 void setLocaleString(String locale)
          Sets the locale to index resources as a String.
 void setMaxHits(int maxHits)
          Sets the number of how many hits are loaded at maximum.
 void setName(String name)
          Sets the logical key/name of this search index.
 void setProject(String projectName)
          Sets the name of the project used to index resources.
 void setProjectName(String projectName)
          Sets the name of the project used to index resources.
 void setRebuildMode(String rebuildMode)
          Sets the rebuild mode of this search index.
 void setRequireViewPermission(boolean requireViewPermission)
          Controls if a resource requires view permission to be displayed in the result list.
 void shutDown()
          Shuts down the search index.
 String toString()
          Returns the name (getName()) of this search index.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

BACKUP_REINDEXING

public static final String BACKUP_REINDEXING
Constant for additional parameter to enable optimized full index regeneration (default: false).


DATES

public static final String[] DATES
Look table to quickly zero-pad days / months in date Strings.


DOC_META_FIELDS

public static final String[] DOC_META_FIELDS
Constant for a field list that contains the "meta" field as well as the "content" field.


EXCERPT

public static final String EXCERPT
Constant for additional parameter to enable excerpt creation (default: true).


EXTRACT_CONTENT

public static final String EXTRACT_CONTENT
Constant for additional parameter for index content extraction.


LUCENE_AUTO_COMMIT

public static final String LUCENE_AUTO_COMMIT
Constant for additional parameter for the Lucene index setting.

See Also:
Constant Field Values

LUCENE_MAX_MERGE_DOCS

public static final String LUCENE_MAX_MERGE_DOCS
Constant for additional parameter for the Lucene index setting.

See Also:
Constant Field Values

LUCENE_MERGE_FACTOR

public static final String LUCENE_MERGE_FACTOR
Constant for additional parameter for the Lucene index setting.

See Also:
Constant Field Values

LUCENE_RAM_BUFFER_SIZE_MB

public static final String LUCENE_RAM_BUFFER_SIZE_MB
Constant for additional parameter for the Lucene index setting.

See Also:
Constant Field Values

LUCENE_USE_COMPOUND_FILE

public static final String LUCENE_USE_COMPOUND_FILE
Constant for additional parameter for the Lucene index setting.

See Also:
Constant Field Values

LUCENE_VERSION

public static final org.apache.lucene.util.Version LUCENE_VERSION
The Lucene Version used to create Query parsers and such.


MAX_HITS

public static final String MAX_HITS
Constant for additional parameter for controlling how many hits are loaded at maximum (default: 1000).


MAX_HITS_DEFAULT

public static final int MAX_HITS_DEFAULT
Indicates how many hits are loaded at maximum by default.

See Also:
Constant Field Values

MAX_YEAR_RANGE

public static final int MAX_YEAR_RANGE
Constant for years max range span in document search.

See Also:
Constant Field Values

PERMISSIONS

public static final String PERMISSIONS
Constant for additional parameter to enable permission checks (default: true).


PRIORITY

public static final String PRIORITY
Constant for additional parameter to set the thread priority during search.


PROPERTY_SEARCH_EXCLUDE_VALUE_ALL

public static final String PROPERTY_SEARCH_EXCLUDE_VALUE_ALL
Special value for the search.exclude property.

See Also:
Constant Field Values

PROPERTY_SEARCH_EXCLUDE_VALUE_GALLERY

public static final String PROPERTY_SEARCH_EXCLUDE_VALUE_GALLERY
Special value for the search.exclude property.

See Also:
Constant Field Values

REBUILD_MODE_AUTO

public static final String REBUILD_MODE_AUTO
Automatic ("auto") index rebuild mode.

See Also:
Constant Field Values

REBUILD_MODE_MANUAL

public static final String REBUILD_MODE_MANUAL
Manual ("manual") index rebuild mode.

See Also:
Constant Field Values

REBUILD_MODE_OFFLINE

public static final String REBUILD_MODE_OFFLINE
Offline ("offline") index rebuild mode.

See Also:
Constant Field Values

TIME_RANGE

public static final String TIME_RANGE
Constant for additional parameter to enable time range checks (default: true).


USE_ALL_LOCALE

public static final String USE_ALL_LOCALE
The use all locale.

See Also:
Constant Field Values

CONTENT_SELECTOR

protected static final org.apache.lucene.document.FieldSelector CONTENT_SELECTOR
Field selector for Lucene that that will ensure the OpenCms default search index fields CmsSearchField.FIELD_CONTENT and CmsSearchField.FIELD_CONTENT_BLOB are lazy loaded.

This is to optimize performance - these 2 fields will be rather large especially for extracted binary documents like PDF, MS Office etc. By using lazy fields the data is only read when it is actually used.


m_requireViewPermission

protected boolean m_requireViewPermission
Controls if a resource requires view permission to be displayed in the result list.


m_sources

protected List<CmsSearchIndexSource> m_sources
The list of configured index sources.

Constructor Detail

CmsSearchIndex

public CmsSearchIndex()
Default constructor only intended to be used by the XML configuration.

It is recommended to use the constructor CmsSearchIndex(String) as it enforces the mandatory name argument.


CmsSearchIndex

public CmsSearchIndex(String name)
               throws CmsIllegalArgumentException
Creates a new CmsSearchIndex with the given name.

Parameters:
name - the system-wide unique name for the search index
Throws:
CmsIllegalArgumentException - if the given name is null, empty or already taken by another search index
Method Detail

getDateRangeSpan

public static List<String> getDateRangeSpan(long startDate,
                                            long endDate)
Generates a list of date terms for the optimized date range search with "daily" granularity level.

How this works:

Parameters:
startDate - start date of the range to search in
endDate - end date of the range to search in
Returns:
a list of date terms for the optimized date range search

addConfigurationParameter

public void addConfigurationParameter(String key,
                                      String value)
Adds a parameter.

Specified by:
addConfigurationParameter in interface I_CmsConfigurationParameterHandler
Parameters:
key - the key/name of the parameter
value - the value of the parameter

addSourceName

public void addSourceName(String sourceName)
Adds am index source to this search index.

Parameters:
sourceName - the index source name to add

checkConfiguration

public boolean checkConfiguration(CmsObject cms)
Checks is this index has been configured correctly.

In case the check fails, the enabled property is set to false

Parameters:
cms - a OpenCms user context to perform the checks with (should have "Administrator" permissions)
Returns:
true in case the index is correctly configured and enabled after the check
See Also:
isEnabled()

equals

public boolean equals(Object obj)
Overrides:
equals in class Object
See Also:
Object.equals(java.lang.Object)

getAnalyzer

public org.apache.lucene.analysis.Analyzer getAnalyzer()
Returns the Lucene analyzer used for this index.

Returns:
the Lucene analyzer used for this index

getConfiguration

public CmsParameterConfiguration getConfiguration()
Description copied from interface: I_CmsConfigurationParameterHandler
Returns the parameters of this configurable class instance, or null if the class does not need any parameters.

Specified by:
getConfiguration in interface I_CmsConfigurationParameterHandler
Returns:
the parameters of this configurable class instance, or null if the class does not need any parameters
See Also:
I_CmsConfigurationParameterHandler.getConfiguration()

getDocument

public org.apache.lucene.document.Document getDocument(String rootPath)
Returns the Lucene document with the given root path from the index.

Parameters:
rootPath - the root path of the document to get
Returns:
the Lucene document with the given root path from the index

getDocumentFactory

public I_CmsDocumentFactory getDocumentFactory(CmsResource res)
Returns the document type factory used for the given resource in this index, or null in case the resource is not indexed by this index.

A resource is indexed if the following is all true:

  1. The index contains at last one index source matching the root path of the given resource.
  2. For this matching index source, the document type factory needed by the resource is also configured.

Parameters:
res - the resource to check
Returns:
he document type factory used for the given resource in this index, or null in case the resource is not indexed by this index

getFieldConfiguration

public CmsSearchFieldConfiguration getFieldConfiguration()
Returns the search field configuration of this index.

Returns:
the search field configuration of this index

getFieldConfigurationName

public String getFieldConfigurationName()
Returns the name of the field configuration used for this index.

Returns:
the name of the field configuration used for this index

getIndexWriter

public I_CmsIndexWriter getIndexWriter(I_CmsReport report,
                                       boolean create)
                                throws CmsIndexException
Returns a new index writer for this index.

Parameters:
report - the report to write error messages on
create - if true a whole new index is created, if false an existing index is updated
Returns:
a new instance of IndexWriter
Throws:
CmsIndexException - if the index can not be opened

getLocale

public Locale getLocale()
Returns the language locale of this index.

Returns:
the language locale of this index, for example "en"

getLocaleForResource

public Locale getLocaleForResource(CmsObject cms,
                                   CmsResource resource,
                                   List<Locale> availableLocales)
Returns the language locale for the given resource in this index.

Parameters:
cms - the current OpenCms user context
resource - the resource to check
availableLocales - a list of locales supported by the resource
Returns:
the language locale for the given resource in this index

getLocaleString

public String getLocaleString()
Returns the language locale of the index as a String.

Returns:
the language locale of the index as a String
See Also:
getLocale()

getMaxHits

public int getMaxHits()
Indicates the number of how many hits are loaded at maximum.

Since Lucene 2.4, the number of maximum documents to load from the index must be specified. The default of this setting is MAX_HITS_DEFAULT (5000). This means that at maximum 5000 results are returned from the index. Please note that this number may be reduced further because of OpenCms read permissions or per-user file visibility settings not controlled in the index.

Returns:
the number of how many hits are loaded at maximum
Since:
7.5.1

getName

public String getName()
Gets the name of this index.

Returns:
the name of the index

getPath

public String getPath()
Returns the path where this index stores it's data in the "real" file system.

Returns:
the path where this index stores it's data in the "real" file system

getPriority

public int getPriority()
Returns the Thread priority for this search index.

Returns:
the Thread priority for this search index

getProject

public String getProject()
Gets the project of this index.

Returns:
the project of the index, i.e. "online"

getRebuildMode

public String getRebuildMode()
Get the rebuild mode of this index.

Returns:
the current rebuild mode

getSearcher

public org.apache.lucene.search.IndexSearcher getSearcher()
Returns the Lucene index searcher used for this search index.

Returns:
the Lucene index searcher used for this search index

getSourceNames

public List<String> getSourceNames()
Returns all configured sources names of this search index.

Returns:
a list with all configured sources names of this search index

getSources

public List<CmsSearchIndexSource> getSources()
Returns all configured index sources of this search index.

Returns:
all configured index sources of this search index

hashCode

public int hashCode()
Overrides:
hashCode in class Object
See Also:
Object.hashCode()

initConfiguration

public void initConfiguration()
Description copied from interface: I_CmsConfigurationParameterHandler
Initializes a configuration after all parameters have been added.

Specified by:
initConfiguration in interface I_CmsConfigurationParameterHandler
See Also:
I_CmsConfigurationParameterHandler.initConfiguration()

initialize

public void initialize()
                throws CmsSearchException
Initializes the search index.

Throws:
CmsSearchException - if the index source association failed

isBackupReindexing

public boolean isBackupReindexing()
Returns true if backup re-indexing is done by this index.

This is an optimization method by which the old extracted content is reused in order to save performance when re-indexing.

Returns:
true if backup re-indexing is done by this index
Since:
7.5.1

isCheckingPermissions

public boolean isCheckingPermissions()
Returns true if permissions are checked for search results by this index.

If permission checks are not required, they can be turned off in the index search configuration parameters in opencms-search.xml. Not checking permissions will improve performance.

This is can be of use in scenarios when you know that all search results are always readable, which is usually true for public websites that do not have personalized accounts.

Please note that even if a result is returned where the current user has no read permissions, the user can not actually access this document. It will only appear in the search result list, but if the user clicks the link to open the document he will get an error.

Returns:
true if permissions are checked for search results by this index

isCheckingTimeRange

public boolean isCheckingTimeRange()
Returns true if the document time range is checked with a granularity level of seconds for search results by this index.

Since OpenCms 8.0, time range checks are always done if CmsSearchParameters.setMinDateLastModified(long) or any of the corresponding methods are used. This is done very efficiently using optimized Lucene filers. However, the granularity of these checks are done only on a daily basis, which means that you can only find "changes made yesterday" but not "changes made last hour". For normal limitation of search results, a daily granularity should be enough.

If time range checks with a granularity level of seconds are required, they can be turned on in the index search configuration parameters in opencms-search.xml. Not checking the time range with a granularity level of seconds will improve performance.

By default the granularity level of seconds is turned off since OpenCms 8.0

Returns:
true if the document time range is checked with a granularity level of seconds for search results by this index

isCreatingExcerpt

public boolean isCreatingExcerpt()
Returns true if an excerpt is generated by this index.

If no except is required, generation can be turned off in the index search configuration parameters in opencms-search.xml. Not generating an excerpt will improve performance.

Returns:
true if an excerpt is generated by this index

isEnabled

public boolean isEnabled()
Returns true if this index is currently disabled.

Returns:
true if this index is currently disabled

isExtractingContent

public boolean isExtractingContent()
Returns true if full text is extracted by this index.

Full text content extraction can be turned off in the index search configuration parameters in opencms-search.xml. Not extraction the full text information will highly improve performance.

Returns:
true if full text is extracted by this index

isRequireViewPermission

public boolean isRequireViewPermission()
Returns true if a resource requires read permission to be incuded in the result list.

Returns:
true if a resource requires read permission to be incuded in the result list

removeSourceName

public void removeSourceName(String sourceName)
Removes an index source from this search index.

Parameters:
sourceName - the index source name to remove

search

public CmsSearchResultList search(CmsObject cms,
                                  CmsSearchParameters params)
                           throws CmsSearchException
Performs a search on the index within the given fields.

The result is returned as List with entries of type I_CmsSearchResult.

Parameters:
cms - the current user's Cms object
params - the parameters to use for the search
Returns:
the List of results found or an empty list
Throws:
CmsSearchException - if something goes wrong

setAnalyzer

public void setAnalyzer(org.apache.lucene.analysis.Analyzer analyzer)
Sets the Lucene analyzer used for this index.

Parameters:
analyzer - the Lucene analyzer to set

setEnabled

public void setEnabled(boolean enabled)
Can be used to enable / disable this index.

Parameters:
enabled - the state of the index to set

setFieldConfiguration

public void setFieldConfiguration(CmsSearchFieldConfiguration fieldConfiguration)
Sets the field configuration used for this index.

Parameters:
fieldConfiguration - the field configuration to set

setFieldConfigurationName

public void setFieldConfigurationName(String fieldConfigurationName)
Sets the name of the field configuration used for this index.

Parameters:
fieldConfigurationName - the name of the field configuration to set

setLocale

public void setLocale(Locale locale)
Sets the locale to index resources.

Parameters:
locale - the locale to index resources

setLocaleString

public void setLocaleString(String locale)
Sets the locale to index resources as a String.

Parameters:
locale - the locale to index resources
See Also:
setLocale(Locale)

setMaxHits

public void setMaxHits(int maxHits)
Sets the number of how many hits are loaded at maximum.

This must be set at least to 50, or this setting is ignored.

Parameters:
maxHits - the number of how many hits are loaded at maximum to set
Since:
7.5.1
See Also:
getMaxHits()

setName

public void setName(String name)
             throws CmsIllegalArgumentException
Sets the logical key/name of this search index.

Parameters:
name - the logical key/name of this search index
Throws:
CmsIllegalArgumentException - if the given name is null, empty or already taken by another search index

setProject

public void setProject(String projectName)
Sets the name of the project used to index resources.

A duplicate method of setProjectName(String) that allows to use instances of this class as a widget object (bean convention, cp.: getProject().

Parameters:
projectName - the name of the project used to index resources

setProjectName

public void setProjectName(String projectName)
Sets the name of the project used to index resources.

Parameters:
projectName - the name of the project used to index resources

setRebuildMode

public void setRebuildMode(String rebuildMode)
Sets the rebuild mode of this search index.

Parameters:
rebuildMode - the rebuild mode of this search index {auto|manual}

setRequireViewPermission

public void setRequireViewPermission(boolean requireViewPermission)
Controls if a resource requires view permission to be displayed in the result list.

By default this is false.

Parameters:
requireViewPermission - controls if a resource requires view permission to be displayed in the result list

shutDown

public void shutDown()
Shuts down the search index.

This will close the local Lucene index searcher instance.


toString

public String toString()
Returns the name (getName()) of this search index.

Overrides:
toString in class Object
Returns:
the name (getName()) of this search index
See Also:
Object.toString()

appendCategoryFilter

protected org.apache.lucene.search.BooleanFilter appendCategoryFilter(CmsObject cms,
                                                                      org.apache.lucene.search.BooleanFilter filter,
                                                                      List<String> categories)
Appends the a category filter to the given filter clause that matches all given categories.

In case the provided List is null or empty, the original filter is left unchanged.

The original filter parameter is extended and also provided as return value.

Parameters:
cms - the current OpenCms search context
filter - the filter to extend
categories - the categories that will compose the filter
Returns:
the extended filter clause

appendDateCreatedFilter

protected org.apache.lucene.search.BooleanFilter appendDateCreatedFilter(org.apache.lucene.search.BooleanFilter filter,
                                                                         long startTime,
                                                                         long endTime)
Appends a date of creation filter to the given filter clause that matches the given time range.

If the start time is equal to Long.MIN_VALUE and the end time is equal to Long.MAX_VALUE than the original filter is left unchanged.

The original filter parameter is extended and also provided as return value.

Parameters:
filter - the filter to extend
startTime - start time of the range to search in
endTime - end time of the range to search in
Returns:
the extended filter clause

appendDateLastModifiedFilter

protected org.apache.lucene.search.BooleanFilter appendDateLastModifiedFilter(org.apache.lucene.search.BooleanFilter filter,
                                                                              long startTime,
                                                                              long endTime)
Appends a date of last modification filter to the given filter clause that matches the given time range.

If the start time is equal to Long.MIN_VALUE and the end time is equal to Long.MAX_VALUE than the original filter is left unchanged.

The original filter parameter is extended and also provided as return value.

Parameters:
filter - the filter to extend
startTime - start time of the range to search in
endTime - end time of the range to search in
Returns:
the extended filter clause

appendPathFilter

protected org.apache.lucene.search.BooleanFilter appendPathFilter(CmsObject cms,
                                                                  org.apache.lucene.search.BooleanFilter filter,
                                                                  List<String> roots)
Appends the a VFS path filter to the given filter clause that matches all given root paths.

In case the provided List is null or empty, the current request context site root is appended.

The original filter parameter is extended and also provided as return value.

Parameters:
cms - the current OpenCms search context
filter - the filter to extend
roots - the VFS root paths that will compose the filter
Returns:
the extended filter clause

appendResourceTypeFilter

protected org.apache.lucene.search.BooleanFilter appendResourceTypeFilter(CmsObject cms,
                                                                          org.apache.lucene.search.BooleanFilter filter,
                                                                          List<String> resourceTypes)
Appends the a resource type filter to the given filter clause that matches all given resource types.

In case the provided List is null or empty, the original filter is left unchanged.

The original filter parameter is extended and also provided as return value.

Parameters:
cms - the current OpenCms search context
filter - the filter to extend
resourceTypes - the resource types that will compose the filter
Returns:
the extended filter clause

createDateRangeFilter

protected org.apache.lucene.search.Filter createDateRangeFilter(String fieldName,
                                                                long startTime,
                                                                long endTime)
Creates an optimized date range filter for the date of last modification or creation.

If the start date is equal to Long.MIN_VALUE and the end date is equal to Long.MAX_VALUE than null is returned.

Parameters:
fieldName - the name of the field to search
startTime - start time of the range to search in
endTime - end time of the range to search in
Returns:
an optimized date range filter for the date of last modification or creation

createIndexBackup

protected String createIndexBackup()
Creates a backup of this index for optimized re-indexing of the whole content.

Returns:
the path to the backup folder, or null in case no backup was created

excludeFromIndex

protected boolean excludeFromIndex(CmsObject cms,
                                   CmsResource resource)
Checks if the provided resource should be excluded from this search index.

Parameters:
cms - the OpenCms context used for building the search index
resource - the resource to index
Returns:
true if the resource should be excluded, false if it should be included in this index

extendPathFilter

protected void extendPathFilter(org.apache.lucene.search.TermsFilter pathFilter,
                                String searchRoot)
Extends the given path query with another term for the given search root element.

Parameters:
pathFilter - the path filter to extend
searchRoot - the search root to add to the path query

getMultiTermQueryFilter

protected org.apache.lucene.search.Filter getMultiTermQueryFilter(String field,
                                                                  List<String> terms)
Returns a cached Lucene term query filter for the given field and terms.

Parameters:
field - the field to use
terms - the term to use
Returns:
a cached Lucene term query filter for the given field and terms

getMultiTermQueryFilter

protected org.apache.lucene.search.Filter getMultiTermQueryFilter(String field,
                                                                  String terms)
Returns a cached Lucene term query filter for the given field and terms.

Parameters:
field - the field to use
terms - the term to use
Returns:
a cached Lucene term query filter for the given field and terms

getMultiTermQueryFilter

protected org.apache.lucene.search.Filter getMultiTermQueryFilter(String field,
                                                                  String termsStr,
                                                                  List<String> termsList)
Returns a cached Lucene term query filter for the given field and terms.

Parameters:
field - the field to use
termsStr - the terms to use as a String separated by a space ' ' char
termsList - the list of terms to use
Returns:
a cached Lucene term query filter for the given field and terms

getTermQueryFilter

protected org.apache.lucene.search.Filter getTermQueryFilter(String field,
                                                             String term)
Returns a cached Lucene term query filter for the given field and term.

Parameters:
field - the field to use
term - the term to use
Returns:
a cached Lucene term query filter for the given field and term

hasReadPermission

protected boolean hasReadPermission(CmsObject cms,
                                    org.apache.lucene.document.Document doc)
Checks if the OpenCms resource referenced by the result document can be read be the user of the given OpenCms context.

Parameters:
cms - the OpenCms user context to use for permission testing
doc - the search result document to check
Returns:
true if the user has read permissions to the resource

indexSearcherClose

protected void indexSearcherClose()
Closes the Lucene index searcher for this index.

See Also:
indexSearcherOpen(String)

indexSearcherClose

protected void indexSearcherClose(org.apache.lucene.search.IndexSearcher searcher)
Closes the given Lucene index searcher.

Parameters:
searcher - the searcher to close

indexSearcherOpen

protected void indexSearcherOpen(String path)
Initializes the Lucene index searcher for this index.

Use getSearcher() in order to obtain the searcher that has been opened.

In case there is an index searcher still open, it is closed first.

For performance reasons, one instance of the Lucene index searcher should be kept for all searches. However, if the index is updated or changed this searcher instance needs to be re-initialized.

Parameters:
path - the path to the index directory

indexSearcherUpdate

protected void indexSearcherUpdate()
Reopens the Lucene index search reader for this index, required after the index has been changed.

See Also:
indexSearcherOpen(String)

indexWriterCreate

protected I_CmsIndexWriter indexWriterCreate(boolean create)
                                      throws CmsIndexException
Creates a new index writer.

Parameters:
create - if true a whole new index is created, if false an existing index is updated
Returns:
the created new index writer
Throws:
CmsIndexException - in case the writer could not be created
See Also:
getIndexWriter(I_CmsReport, boolean)

indexWriterUnlock

protected void indexWriterUnlock(I_CmsReport report)
                          throws CmsIndexException
Unlocks the Lucene index writer of this index if required.

Parameters:
report - the report to write error messages on
Throws:
CmsIndexException - if unlocking of the index is impossible for any reason

isInTimeRange

protected boolean isInTimeRange(org.apache.lucene.document.Document doc,
                                CmsSearchParameters params)
Checks if the document is in the time range specified in the search parameters.

The creation date and/or the last modification date are checked.

Parameters:
doc - the document to check the dates against the given time range
params - the search parameters where the time ranges are specified
Returns:
true if document is in time range or not time range set otherwise false

prepareSortScoring

protected void prepareSortScoring(org.apache.lucene.search.IndexSearcher searcher,
                                  org.apache.lucene.search.Sort sort)
Checks if the score for the results must be calculated based on the provided sort option.

Since Lucene 3 apparently the score is no longer calculated by default, but only if the searcher is explicitly told so. This methods checks if, based on the given sort, the score must be calculated.

Parameters:
searcher - the index searcher to prepare
sort - the sort option to use

removeIndexBackup

protected void removeIndexBackup(String path)
Removes the given backup folder of this index.

Parameters:
path - the backup folder to remove
See Also:
isBackupReindexing()