Index
All Classes and Interfaces|All Packages|Serialized Form
$
- $ - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Punctuation $
A
- Abbreviations - Interface in smile.nlp.dictionary
-
A dictionary interface for abbreviations.
- add(Text) - Method in class smile.nlp.SimpleCorpus
-
Adds a document to the corpus.
- addAnchor(String) - Method in interface smile.nlp.AnchorText
-
Adds a link label to the anchor text.
- addAnchor(String) - Method in class smile.nlp.SimpleText
- addChild(K[], V, int) - Method in class smile.nlp.Trie.Node
-
Adds a child.
- AnchorText - Interface in smile.nlp
-
The anchor text is the visible, clickable text in a hyperlink.
- apply(String) - Method in class smile.nlp.embedding.Word2Vec
-
Returns the embedding vector of a word.
- apply(String) - Method in interface smile.nlp.tokenizer.Tokenizer
- avgDocSize() - Method in interface smile.nlp.Corpus
-
Returns the average size of documents in the corpus.
- avgDocSize() - Method in class smile.nlp.SimpleCorpus
B
- Bigram - Class in smile.nlp
-
Bigrams or digrams are groups of two words, and are very commonly used as the basis for simple statistical analysis of text.
- Bigram - Class in smile.nlp.collocation
-
Collocations are expressions of multiple words which commonly co-occur.
- Bigram(String, String) - Constructor for class smile.nlp.Bigram
-
Constructor.
- Bigram(String, String, int, double) - Constructor for class smile.nlp.collocation.Bigram
-
Constructor.
- bigrams() - Method in interface smile.nlp.Corpus
-
Returns the iterator over the bigrams in the corpus.
- bigrams() - Method in class smile.nlp.SimpleCorpus
- BM25 - Class in smile.nlp.relevance
-
The BM25 weighting scheme, often called Okapi weighting, after the system in which it was first implemented, was developed as a way of building a probabilistic model sensitive to term frequency and document length while not introducing too many additional parameters into the model.
- BM25() - Constructor for class smile.nlp.relevance.BM25
-
Default constructor with k1 = 1.2, b = 0.75, delta = 1.0.
- BM25(double, double, double) - Constructor for class smile.nlp.relevance.BM25
-
Constructor.
- body - Variable in class smile.nlp.Text
-
The text body.
- BreakIteratorSentenceSplitter - Class in smile.nlp.tokenizer
-
A sentence splitter based on the java.text.BreakIterator, which supports multiple natural languages (selected by locale setting).
- BreakIteratorSentenceSplitter() - Constructor for class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
-
Constructor for the default locale.
- BreakIteratorSentenceSplitter(Locale) - Constructor for class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
-
Constructor for the given locale.
- BreakIteratorTokenizer - Class in smile.nlp.tokenizer
-
A word tokenizer based on the java.text.BreakIterator, which supports multiple natural languages (selected by locale setting).
- BreakIteratorTokenizer() - Constructor for class smile.nlp.tokenizer.BreakIteratorTokenizer
-
Constructor for the default locale.
- BreakIteratorTokenizer(Locale) - Constructor for class smile.nlp.tokenizer.BreakIteratorTokenizer
-
Constructor for the given locale.
C
- CC - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Coordinating conjunction.
- CD - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Cardinal number.
- CLOSING_PARENTHESIS - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Punctuation ) ] }
- CLOSING_QUOTATION - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Punctuation ' or ''
- COLON - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Punctuation ; : ...
- COMMA - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Punctuation ,
- compareTo(Bigram) - Method in class smile.nlp.collocation.Bigram
- compareTo(NGram) - Method in class smile.nlp.collocation.NGram
- compareTo(Relevance) - Method in class smile.nlp.relevance.Relevance
- COMPREHENSIVE - Enum constant in enum class smile.nlp.dictionary.EnglishStopWords
-
A very long list of stop words.
- CONCISE - Enum constant in enum class smile.nlp.dictionary.EnglishDictionary
-
A concise dictionary of common terms in English.
- contains(String) - Method in interface smile.nlp.dictionary.Dictionary
-
Returns true if this dictionary contains the specified word.
- contains(String) - Method in enum class smile.nlp.dictionary.EnglishDictionary
- contains(String) - Method in class smile.nlp.dictionary.EnglishPunctuations
- contains(String) - Method in enum class smile.nlp.dictionary.EnglishStopWords
- contains(String) - Method in class smile.nlp.dictionary.SimpleDictionary
- CooccurrenceKeywords - Interface in smile.nlp.keyword
-
Keyword extraction from a single document using word co-occurrence statistical information.
- Corpus - Interface in smile.nlp
-
A corpus is a collection of documents.
- count - Variable in class smile.nlp.collocation.Bigram
-
The frequency of bigram in the corpus.
- count - Variable in class smile.nlp.collocation.NGram
-
The frequency of n-gram in the corpus.
- count(String) - Method in interface smile.nlp.Corpus
-
Returns the total frequency of the term in the corpus.
- count(String) - Method in class smile.nlp.SimpleCorpus
- count(Bigram) - Method in interface smile.nlp.Corpus
-
Returns the total frequency of the bigram in the corpus.
- count(Bigram) - Method in class smile.nlp.SimpleCorpus
D
- DASH - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Punctuation -
- DEFAULT - Enum constant in enum class smile.nlp.dictionary.EnglishStopWords
-
Default stop words list.
- Dictionary - Interface in smile.nlp.dictionary
-
A dictionary is a set of words in some natural language.
- dimension() - Method in class smile.nlp.embedding.Word2Vec
-
Returns the dimension of embedding vector space.
- DT - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Determiner.
E
- EnglishDictionary - Enum Class in smile.nlp.dictionary
-
A concise dictionary of common terms in English.
- EnglishPOSLexicon - Class in smile.nlp.pos
-
An English lexicon with part-of-speech tags.
- EnglishPunctuations - Class in smile.nlp.dictionary
-
Punctuation marks in English.
- EnglishStopWords - Enum Class in smile.nlp.dictionary
-
Several sets of English stop words.
- equals(Object) - Method in class smile.nlp.Bigram
- equals(Object) - Method in class smile.nlp.NGram
- equals(Object) - Method in class smile.nlp.SimpleText
- EX - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Existential there.
F
- fit(String[][], PennTreebankPOS[][]) - Static method in class smile.nlp.pos.HMMPOSTagger
-
Fits an HMM POS tagger by maximum likelihood estimation.
- FW - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Foreign word.
G
- get(String) - Method in class smile.nlp.embedding.Word2Vec
-
Returns the embedding vector of a word.
- get(String) - Static method in class smile.nlp.pos.EnglishPOSLexicon
-
Returns the part-of-speech tags for given word, or null if the word does not exist in the dictionary.
- get(K) - Method in class smile.nlp.Trie
-
Returns the node of a given key.
- get(K[]) - Method in class smile.nlp.Trie
-
Returns the associated value of a given key.
- getAbbreviation(String) - Method in interface smile.nlp.dictionary.Abbreviations
-
Returns the abbreviation for a word.
- getAnchor() - Method in interface smile.nlp.AnchorText
-
Returns the anchor text if any.
- getAnchor() - Method in class smile.nlp.SimpleText
-
Returns the anchor text if any.
- getChild(K) - Method in class smile.nlp.Trie.Node
-
Returns the child with the key.
- getChild(K[], int) - Method in class smile.nlp.Trie.Node
-
Returns the value matching the key sequence.
- getDefault() - Static method in class smile.nlp.pos.HMMPOSTagger
-
Returns the default English POS tagger.
- getFull(String) - Method in interface smile.nlp.dictionary.Abbreviations
-
Returns the full word of an abbreviation.
- getInstance() - Static method in class smile.nlp.dictionary.EnglishPunctuations
-
Returns the singleton instance.
- getInstance() - Static method in class smile.nlp.normalizer.SimpleNormalizer
-
Returns the singleton instance.
- getInstance() - Static method in class smile.nlp.tokenizer.PennTreebankTokenizer
-
Returns the singleton instance.
- getInstance() - Static method in class smile.nlp.tokenizer.SimpleParagraphSplitter
-
Returns the singleton instance.
- getInstance() - Static method in class smile.nlp.tokenizer.SimpleSentenceSplitter
-
Returns the singleton instance.
- getKey() - Method in class smile.nlp.Trie.Node
-
Returns the key.
- getValue() - Method in class smile.nlp.Trie.Node
-
Returns the value.
- getValue(String) - Static method in enum class smile.nlp.pos.PennTreebankPOS
-
Returns an enum value from a string.
- GloVe - Class in smile.nlp.embedding
-
Global Vectors for Word Representation.
- GloVe() - Constructor for class smile.nlp.embedding.GloVe
- GOOGLE - Enum constant in enum class smile.nlp.dictionary.EnglishStopWords
-
The stop words list used by Google.
H
- hashCode() - Method in class smile.nlp.Bigram
- hashCode() - Method in class smile.nlp.NGram
- hashCode() - Method in class smile.nlp.SimpleText
- HMMPOSTagger - Class in smile.nlp.pos
-
Part-of-speech tagging with hidden Markov model.
- HMMPOSTagger() - Constructor for class smile.nlp.pos.HMMPOSTagger
-
Constructor.
I
- id - Variable in class smile.nlp.Text
-
The id of document in the corpus.
- IN - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Preposition or subordinating conjunction.
- iterator() - Method in interface smile.nlp.dictionary.Dictionary
-
Returns an iterator over the words in this dictionary.
- iterator() - Method in enum class smile.nlp.dictionary.EnglishDictionary
- iterator() - Method in class smile.nlp.dictionary.EnglishPunctuations
- iterator() - Method in enum class smile.nlp.dictionary.EnglishStopWords
- iterator() - Method in class smile.nlp.dictionary.SimpleDictionary
J
- JJ - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Adjective.
- JJR - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Adjective, comparative.
- JJS - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Adjective, superlative.
L
- LancasterStemmer - Class in smile.nlp.stemmer
-
The Paice/Husk Lancaster stemming algorithm.
- LancasterStemmer() - Constructor for class smile.nlp.stemmer.LancasterStemmer
-
Constructor with default rules.
- LancasterStemmer(boolean) - Constructor for class smile.nlp.stemmer.LancasterStemmer
-
Constructor with default rules.
- LancasterStemmer(InputStream) - Constructor for class smile.nlp.stemmer.LancasterStemmer
-
Constructor with customized rules.
- LancasterStemmer(InputStream, boolean) - Constructor for class smile.nlp.stemmer.LancasterStemmer
-
Constructor with customized rules.
- LS - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
List item marker.
M
- main(String[]) - Static method in class smile.nlp.pos.HMMPOSTagger
-
Train the default model on WSJ and BROWN datasets.
- maxtf() - Method in class smile.nlp.SimpleText
- maxtf() - Method in interface smile.nlp.TextTerms
-
Returns the maximum term frequency over all terms in the document.
- MD - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Modal verb.
- MYSQL - Enum constant in enum class smile.nlp.dictionary.EnglishStopWords
-
The stop words list used by MySQL FullText feature.
N
- nbigram() - Method in interface smile.nlp.Corpus
-
Returns the number of bigrams in the corpus.
- nbigram() - Method in class smile.nlp.SimpleCorpus
- ndoc() - Method in interface smile.nlp.Corpus
-
Returns the number of documents in the corpus.
- ndoc() - Method in class smile.nlp.SimpleCorpus
- NGram - Class in smile.nlp.collocation
-
An n-gram is a contiguous sequence of n words from a given sequence of text.
- NGram - Class in smile.nlp
-
An n-gram is a contiguous sequence of n words from a given sequence of text.
- NGram(String[]) - Constructor for class smile.nlp.NGram
-
Constructor.
- NGram(String[], int) - Constructor for class smile.nlp.collocation.NGram
-
Constructor.
- NN - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Noun, singular or mass.
- NNP - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Proper noun, singular.
- NNPS - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Proper noun, plural.
- NNS - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Noun, plural.
- Node(K) - Constructor for class smile.nlp.Trie.Node
-
Constructor.
- normalize(String) - Method in interface smile.nlp.normalizer.Normalizer
-
Normalize the given string.
- normalize(String) - Method in class smile.nlp.normalizer.SimpleNormalizer
- Normalizer - Interface in smile.nlp.normalizer
-
Normalization transforms text into a canonical form by removing unwanted variations.
- nterm() - Method in interface smile.nlp.Corpus
-
Returns the number of unique terms in the corpus.
- nterm() - Method in class smile.nlp.SimpleCorpus
O
- of(String) - Static method in interface smile.nlp.keyword.CooccurrenceKeywords
-
Returns the top 10 keywords.
- of(String, int) - Static method in interface smile.nlp.keyword.CooccurrenceKeywords
-
Returns a given number of top keywords.
- of(Path) - Static method in class smile.nlp.embedding.GloVe
-
Loads a GloVe model.
- of(Path) - Static method in class smile.nlp.embedding.Word2Vec
-
Loads a pre-trained word2vec model from binary file of ByteOrder.LITTLE_ENDIAN.
- of(Path, ByteOrder) - Static method in class smile.nlp.embedding.Word2Vec
-
Loads a pre-trained word2vec model from binary file.
- of(Collection<String[]>, int, int) - Static method in class smile.nlp.collocation.NGram
-
Extracts n-gram phrases by an Apiori-like algorithm.
- of(Corpus, double, int) - Static method in class smile.nlp.collocation.Bigram
-
Finds bigram collocations in the given corpus whose p-value is less than the given threshold.
- of(Corpus, int, int) - Static method in class smile.nlp.collocation.Bigram
-
Finds top k bigram collocations in the given corpus.
- open - Variable in enum class smile.nlp.pos.PennTreebankPOS
-
True if the POS is a open class.
- OPENING_PARENTHESIS - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Punctuation ( [ {
- OPENING_QUOTATION - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Punctuation ` or ``
P
- ParagraphSplitter - Interface in smile.nlp.tokenizer
-
A paragraph splitter segments text into paragraphs.
- PDT - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Predeterminer.
- PennTreebankPOS - Enum Class in smile.nlp.pos
-
The Penn Treebank Tag set.
- PennTreebankTokenizer - Class in smile.nlp.tokenizer
-
A word tokenizer that tokenizes English sentences using the conventions used by the Penn Treebank.
- PorterStemmer - Class in smile.nlp.stemmer
-
Porter's stemming algorithm.
- PorterStemmer() - Constructor for class smile.nlp.stemmer.PorterStemmer
-
Constructor.
- POS - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Possessive ending.
- POSTagger - Interface in smile.nlp.pos
-
Part-of-speech tagging (POS tagging) is the process of marking up the words in a sentence as corresponding to a particular part of speech.
- POUND - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Punctuation #
- PRP - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Personal pronoun.
- PRP$ - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Possessive pronoun.
- Punctuations - Interface in smile.nlp.dictionary
-
Punctuation marks are symbols that indicate the structure and organization of written language, as well as intonation and pauses to be observed when reading aloud.
- put(K[], V) - Method in class smile.nlp.Trie
-
Add a key with associated value to the trie.
R
- rank(int, int, long, long) - Method in class smile.nlp.relevance.TFIDF
-
Returns the relevance score between a term and a document based on a corpus.
- rank(Corpus, TextTerms, String[], int[], int) - Method in class smile.nlp.relevance.BM25
- rank(Corpus, TextTerms, String[], int[], int) - Method in interface smile.nlp.relevance.RelevanceRanker
-
Returns the relevance score between a set of terms and a document based on a corpus.
- rank(Corpus, TextTerms, String[], int[], int) - Method in class smile.nlp.relevance.TFIDF
- rank(Corpus, TextTerms, String, int, int) - Method in class smile.nlp.relevance.BM25
- rank(Corpus, TextTerms, String, int, int) - Method in interface smile.nlp.relevance.RelevanceRanker
-
Returns the relevance score between a term and a document based on a corpus.
- rank(Corpus, TextTerms, String, int, int) - Method in class smile.nlp.relevance.TFIDF
- RB - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Adverb.
- RBR - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Adverb, comparative.
- RBS - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Adverb, superlative.
- read(Path, List<String[]>, List<PennTreebankPOS[]>) - Static method in class smile.nlp.pos.HMMPOSTagger
-
Load training data from a corpora.
- Relevance - Class in smile.nlp.relevance
-
In the context of information retrieval, relevance denotes how well a retrieved set of documents meets the information need of the user.
- Relevance(Text, double) - Constructor for class smile.nlp.relevance.Relevance
-
Constructor.
- RelevanceRanker - Interface in smile.nlp.relevance
-
An interface to provide relevance ranking algorithm.
- RP - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Particle.
S
- score - Variable in class smile.nlp.collocation.Bigram
-
The chi-square statistical score of the collocation.
- score - Variable in class smile.nlp.relevance.Relevance
-
The relevance score.
- score(double, int, double, long, long) - Method in class smile.nlp.relevance.BM25
-
Returns the relevance score between a term and a document based on a corpus.
- score(double, long, long) - Method in class smile.nlp.relevance.BM25
-
Returns the relevance score between a term and a document based on a corpus.
- score(int, int, double, int, int, double, int, int, double, long, long) - Method in class smile.nlp.relevance.BM25
-
Returns the relevance score between a term and a document based on a corpus.
- search(String) - Method in interface smile.nlp.Corpus
-
Returns the iterator over the set of documents containing the given term.
- search(String) - Method in class smile.nlp.SimpleCorpus
- search(RelevanceRanker, String) - Method in interface smile.nlp.Corpus
-
Returns the iterator over the set of documents containing the given term in descending order of relevance.
- search(RelevanceRanker, String) - Method in class smile.nlp.SimpleCorpus
- search(RelevanceRanker, String[]) - Method in interface smile.nlp.Corpus
-
Returns the iterator over the set of documents containing (at least one of) the given terms in descending order of relevance.
- search(RelevanceRanker, String[]) - Method in class smile.nlp.SimpleCorpus
- SENT - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Sentence-break punctuation .
- SentenceSplitter - Interface in smile.nlp.tokenizer
-
A sentence splitter segments text into sentences (a string of words satisfying the grammatical rules of a language).
- setAnchor(String) - Method in interface smile.nlp.AnchorText
-
Sets the anchor text.
- setAnchor(String) - Method in class smile.nlp.SimpleText
-
Sets the anchor text.
- SimpleCorpus - Class in smile.nlp
-
An in-memory text corpus.
- SimpleCorpus() - Constructor for class smile.nlp.SimpleCorpus
-
Constructor.
- SimpleCorpus(SentenceSplitter, Tokenizer, StopWords, Punctuations) - Constructor for class smile.nlp.SimpleCorpus
-
Constructor.
- SimpleDictionary - Class in smile.nlp.dictionary
-
A simple implementation of dictionary interface.
- SimpleDictionary(String) - Constructor for class smile.nlp.dictionary.SimpleDictionary
-
Constructor.
- SimpleNormalizer - Class in smile.nlp.normalizer
-
A baseline normalizer for processing Unicode text.
- SimpleParagraphSplitter - Class in smile.nlp.tokenizer
-
This is a simple paragraph splitter.
- SimpleSentenceSplitter - Class in smile.nlp.tokenizer
-
This is a simple sentence splitter for English.
- SimpleText - Class in smile.nlp
-
A list-of-words representation of documents.
- SimpleText(String, String, String, String[]) - Constructor for class smile.nlp.SimpleText
-
Constructor.
- SimpleTokenizer - Class in smile.nlp.tokenizer
-
A word tokenizer that tokenizes English sentences with some differences from TreebankWordTokenizer, notably on handling not-contractions.
- SimpleTokenizer() - Constructor for class smile.nlp.tokenizer.SimpleTokenizer
-
Constructor.
- SimpleTokenizer(boolean) - Constructor for class smile.nlp.tokenizer.SimpleTokenizer
-
Constructor.
- size() - Method in interface smile.nlp.Corpus
-
Returns the number of words in the corpus.
- size() - Method in interface smile.nlp.dictionary.Dictionary
-
Returns the number of words in this dictionary.
- size() - Method in enum class smile.nlp.dictionary.EnglishDictionary
- size() - Method in class smile.nlp.dictionary.EnglishPunctuations
- size() - Method in enum class smile.nlp.dictionary.EnglishStopWords
- size() - Method in class smile.nlp.dictionary.SimpleDictionary
- size() - Method in class smile.nlp.SimpleCorpus
- size() - Method in class smile.nlp.SimpleText
- size() - Method in interface smile.nlp.TextTerms
-
Returns the number of words.
- size() - Method in class smile.nlp.Trie
-
Returns the number of entries.
- smile.nlp - package smile.nlp
-
Natural language processing.
- smile.nlp.collocation - package smile.nlp.collocation
-
Collocation finding algorithms.
- smile.nlp.dictionary - package smile.nlp.dictionary
-
Common dictionaries such as stop words, punctuation, common English words, etc.
- smile.nlp.embedding - package smile.nlp.embedding
-
Word embedding.
- smile.nlp.keyword - package smile.nlp.keyword
-
Keyword extraction.
- smile.nlp.normalizer - package smile.nlp.normalizer
-
Text normalization.
- smile.nlp.pos - package smile.nlp.pos
-
Part-of-speech taggers.
- smile.nlp.relevance - package smile.nlp.relevance
-
Term-document relevance ranking algorithms.
- smile.nlp.stemmer - package smile.nlp.stemmer
-
English word stemmer algorithms.
- smile.nlp.tokenizer - package smile.nlp.tokenizer
-
Sentence splitter and word tokenizer.
- split(String) - Method in class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
- split(String) - Method in class smile.nlp.tokenizer.BreakIteratorTokenizer
- split(String) - Method in interface smile.nlp.tokenizer.ParagraphSplitter
-
Splits the text into paragraphs.
- split(String) - Method in class smile.nlp.tokenizer.PennTreebankTokenizer
- split(String) - Method in interface smile.nlp.tokenizer.SentenceSplitter
-
Splits the text into sentences.
- split(String) - Method in class smile.nlp.tokenizer.SimpleParagraphSplitter
- split(String) - Method in class smile.nlp.tokenizer.SimpleSentenceSplitter
- split(String) - Method in class smile.nlp.tokenizer.SimpleTokenizer
- split(String) - Method in interface smile.nlp.tokenizer.Tokenizer
-
Splits the string into a list of tokens.
- stem(String) - Method in class smile.nlp.stemmer.LancasterStemmer
- stem(String) - Method in class smile.nlp.stemmer.PorterStemmer
- stem(String) - Method in interface smile.nlp.stemmer.Stemmer
-
Transforms a word into its root form.
- Stemmer - Interface in smile.nlp.stemmer
-
A Stemmer transforms a word into its root form.
- StopWords - Interface in smile.nlp.dictionary
-
A set of stop words in some language.
- stripPluralParticiple(String) - Method in class smile.nlp.stemmer.PorterStemmer
-
Removes plurals and participles.
- SYM - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Symbol.
T
- tag(String[]) - Method in class smile.nlp.pos.HMMPOSTagger
- tag(String[]) - Method in interface smile.nlp.pos.POSTagger
-
Tags the sentence in the form of a sequence of words.
- terms() - Method in interface smile.nlp.Corpus
-
Returns the iterator over the terms in the corpus.
- terms() - Method in class smile.nlp.SimpleCorpus
- text - Variable in class smile.nlp.relevance.Relevance
-
The document to rank.
- Text - Class in smile.nlp
-
A minimal interface of text in the corpus.
- Text(String) - Constructor for class smile.nlp.Text
-
Constructor.
- Text(String, String) - Constructor for class smile.nlp.Text
-
Constructor.
- Text(String, String, String) - Constructor for class smile.nlp.Text
-
Constructor.
- TextTerms - Interface in smile.nlp
-
The terms in a text.
- tf(String) - Method in class smile.nlp.SimpleText
- tf(String) - Method in interface smile.nlp.TextTerms
-
Returns the term frequency.
- TFIDF - Class in smile.nlp.relevance
-
The tf-idf weight (term frequency-inverse document frequency) is a weight often used in information retrieval and text mining.
- TFIDF() - Constructor for class smile.nlp.relevance.TFIDF
-
Constructor.
- TFIDF(double) - Constructor for class smile.nlp.relevance.TFIDF
-
Constructor.
- title - Variable in class smile.nlp.Text
-
The title of document;
- TO - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
to.
- Tokenizer - Interface in smile.nlp.tokenizer
-
A token is a string of characters, categorized according to the rules as a symbol.
- toString() - Method in class smile.nlp.Bigram
- toString() - Method in class smile.nlp.collocation.Bigram
- toString() - Method in class smile.nlp.collocation.NGram
- toString() - Method in class smile.nlp.NGram
- toString() - Method in class smile.nlp.SimpleText
- Trie<K,
V> - Class in smile.nlp -
A trie, also called digital tree or prefix tree, is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.
- Trie() - Constructor for class smile.nlp.Trie
-
Constructor.
- Trie(int) - Constructor for class smile.nlp.Trie
-
Constructor.
- Trie.Node - Class in smile.nlp
-
The nodes in the trie.
U
- UH - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Interjection.
- unique() - Method in class smile.nlp.SimpleText
- unique() - Method in interface smile.nlp.TextTerms
-
Returns the iterator of unique words.
V
- valueOf(String) - Static method in enum class smile.nlp.dictionary.EnglishDictionary
-
Returns the enum constant of this class with the specified name.
- valueOf(String) - Static method in enum class smile.nlp.dictionary.EnglishStopWords
-
Returns the enum constant of this class with the specified name.
- valueOf(String) - Static method in enum class smile.nlp.pos.PennTreebankPOS
-
Returns the enum constant of this class with the specified name.
- values() - Static method in enum class smile.nlp.dictionary.EnglishDictionary
-
Returns an array containing the constants of this enum class, in the order they are declared.
- values() - Static method in enum class smile.nlp.dictionary.EnglishStopWords
-
Returns an array containing the constants of this enum class, in the order they are declared.
- values() - Static method in enum class smile.nlp.pos.PennTreebankPOS
-
Returns an array containing the constants of this enum class, in the order they are declared.
- VB - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Verb, base form.
- VBD - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Verb, past tense.
- VBG - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Verb, gerund or present participle.
- VBN - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Verb, past participle.
- VBP - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Verb, non-3rd person singular present.
- VBZ - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Verb, 3rd person singular present.
- vectors - Variable in class smile.nlp.embedding.Word2Vec
-
The vector space.
W
- w1 - Variable in class smile.nlp.Bigram
-
Immutable first word of bigram.
- w2 - Variable in class smile.nlp.Bigram
-
Immutable second word of bigram.
- walkin(File, List<File>) - Static method in class smile.nlp.pos.HMMPOSTagger
-
Recursive function to descend into the directory tree and find all the files that end with ".POS"
- WDT - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Wh-determiner.
- Word2Vec - Class in smile.nlp.embedding
-
Word2vec is a group of related models that are used to produce word embeddings.
- Word2Vec(String[], float[][]) - Constructor for class smile.nlp.embedding.Word2Vec
-
Constructor.
- words - Variable in class smile.nlp.embedding.Word2Vec
-
The vocabulary.
- words - Variable in class smile.nlp.NGram
-
Immutable word sequences.
- words() - Method in class smile.nlp.SimpleText
- words() - Method in interface smile.nlp.TextTerms
-
Returns the iterator of the words of the document.
- WP - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Wh-pronoun.
- WP$ - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Possessive wh-pronoun.
- WRB - Enum constant in enum class smile.nlp.pos.PennTreebankPOS
-
Wh-adverb.
All Classes and Interfaces|All Packages|Serialized Form