EnglishTextTokenizer (Nitrite 2.0.1 API)

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.dizitart.no2.fulltext.EnglishTextTokenizer

All Implemented Interfaces:

TextTokenizer
```
public class EnglishTextTokenizer
extends java.lang.Object
implements TextTokenizer
```
A TextTokenizer implementation for the English languages.

Since:

1.0

Constructor Summary

Constructors
Constructor and Description

EnglishTextTokenizer()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected java.lang.String`	`convertWord(java.lang.String word)` Converts a `word` into all upper case and checks if it is a known stop word in english language.
`java.util.Set<java.lang.String>`	`stopWords()` Gets all stop-words for a language.
`java.util.Set<java.lang.String>`	`tokenize(java.lang.String text)` Tokenize a `text` and discards all stop-words from it.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - EnglishTextTokenizer
```
public EnglishTextTokenizer()
```
- Method Detail
  - tokenize
```
public java.util.Set<java.lang.String> tokenize(java.lang.String text)
                                         throws java.io.IOException
```
    Description copied from interface: TextTokenizer
    
    Tokenize a text and discards all stop-words from it.
    
    Specified by:
    
    tokenize in interface TextTokenizer
    
    Parameters:
    
    text - the text to tokenize
    
    Returns:
    
    the set of tokens.
    
    Throws:
    
    java.io.IOException - if a low-level I/O error occurs.
  - stopWords
```
public java.util.Set<java.lang.String> stopWords()
```
    Description copied from interface: TextTokenizer
    
    Gets all stop-words for a language.
    
    Specified by:
    
    stopWords in interface TextTokenizer
    
    Returns:
    
    the set of all stop-words.
  - convertWord
```
protected java.lang.String convertWord(java.lang.String word)
```
    Converts a word into all upper case and checks if it is a known stop word in english language. If it is, then the word will be discarded and will not be considered as a valid token.
    
    Parameters:
    
    word - the word
    
    Returns:
    
    the tokenized word in all upper case.

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method