public final class CJKBigramFilterFactory extends AbstractTokenFilterFactory
CJKBigramFilter
to form bigrams of CJK terms
that are generated from StandardTokenizer or ICUTokenizer.
CJK types are set by these tokenizers, but you can also use flags to explicitly control which of the CJK scripts are turned into bigrams.
By default, when a CJK character has no adjacent characters to form a bigram,
it is output in unigram form. If you want to always output both unigrams and
bigrams, set the outputUnigrams
flag. This can be used for a
combined unigram+bigram approach.
In all cases, all non-CJK input is passed thru unmodified.
version
deprecationLogger, index, indexSettings, logger
Constructor and Description |
---|
CJKBigramFilterFactory(Index index,
IndexSettingsService indexSettingsService,
String name,
Settings settings) |
Modifier and Type | Method and Description |
---|---|
org.apache.lucene.analysis.TokenStream |
create(org.apache.lucene.analysis.TokenStream tokenStream) |
name, version
index, indexSettings
@Inject public CJKBigramFilterFactory(Index index, IndexSettingsService indexSettingsService, String name, Settings settings)
Copyright © 2009–2016. All rights reserved.