All Known Implementing Classes:

OpenNlpLinguistics, SimpleLinguistics
```
public interface Linguistics
```
Factory of linguistic processors. For technical reasons this provides more flexibility to provide separate components for different operations than is needed in many cases; in particular the tokenizer should typically stem, transform and normalize using the same operations as provided directly by this. A set of adaptors are provided that makes this easy to achieve. Refer to the {com.yahoo.language.simple.SimpleLinguistics} implementation to set this up.

Thread safety: Instances of this factory type must be thread safe but the processors returned by the factory methods do not. Clients should request separate processor instances for each thread.

Author:

Mathias Mølster Lidal, Simon Thoresen Hult, bratseth

Nested Class Summary

Nested Classes
Modifier and Type Interface Description

static class Linguistics.Component

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method	Description
`boolean`	`equals(Linguistics other)`	Check if another instance is equivalent to this one
`CharacterClasses`	`getCharacterClasses()`	Returns a thread-unsafe character classes instance.
`Detector`	`getDetector()`	Returns a thread-unsafe detector.
`GramSplitter`	`getGramSplitter()`	Returns a thread-unsafe gram splitter.
`Normalizer`	`getNormalizer()`	Returns a thread-unsafe normalizer.
`Segmenter`	`getSegmenter()`	Returns a thread-unsafe segmenter.
`Stemmer`	`getStemmer()`	Returns a thread-unsafe stemmer or lemmatizer.
`Tokenizer`	`getTokenizer()`	Returns a thread-unsafe tokenizer.
`Transformer`	`getTransformer()`	Returns a thread-unsafe transformer.

- Method Detail
  - getStemmer
```
Stemmer getStemmer()
```
    Returns a thread-unsafe stemmer or lemmatizer. This is used at query time to do stemming of search terms to indexes which contains text tokenized with stemming turned on
  - getTokenizer
```
Tokenizer getTokenizer()
```
    Returns a thread-unsafe tokenizer. This is used at indexing time to produce a optionally stemmed and transformed (accent normalized) stream of indexable tokens.
  - getNormalizer
```
Normalizer getNormalizer()
```
    Returns a thread-unsafe normalizer. This is used at query time to cjk normalize query text.
  - getTransformer
```
Transformer getTransformer()
```
    Returns a thread-unsafe transformer. This is used at query time to do stemming of search terms to indexes which contains text tokenized with accent normalization turned on
  - getSegmenter
```
Segmenter getSegmenter()
```
    Returns a thread-unsafe segmenter. This is used at query time to find the individual semantic components of search terms to indexes tokenized with segmentation.
  - getDetector
```
Detector getDetector()
```
    Returns a thread-unsafe detector. The language of the text is a parameter to other linguistic operations. This is used to determine the language of a query or document field when not specified explicitly.
  - getGramSplitter
```
GramSplitter getGramSplitter()
```
    Returns a thread-unsafe gram splitter. This is used to split query or document text into fixed-length grams which allows matching without needing or using segmented tokens.
  - getCharacterClasses
```
CharacterClasses getCharacterClasses()
```
    Returns a thread-unsafe character classes instance.
  - equals
```
boolean equals(Linguistics other)
```
    Check if another instance is equivalent to this one

Interface Linguistics

Nested Class Summary

Method Summary

Method Detail

getStemmer

getTokenizer

getNormalizer

getTransformer

getSegmenter

getDetector

getGramSplitter

getCharacterClasses

equals