nlp

package nlp

Natural language processing.

Linear Supertypes

Operators, AnyRef, Any

Ordering

Alphabetic
By inheritance

Inherited

nlp
Operators
AnyRef
Any

Hide All
Show all

Learn more about member selection

Visibility

Public
All

Type Members

trait Operators extends AnyRef

High level NLP operators.

Value Members

def bigram(p: Double, minFreq: Int, text: String*): Array[BigramCollocation]

Identify bigram collocations whose p-value is less than the given threshold.
Identify bigram collocations whose p-value is less than the given threshold.
p
the p-value threshold
minFreq
the minimum frequency of collocation.
text
input text.
returns
significant bigram collocations in descending order of likelihood ratio.

Definition Classes
Operators
def bigram(k: Int, minFreq: Int, text: String*): Array[BigramCollocation]

Identify bigram collocations (words that often appear consecutively) within corpora.
Identify bigram collocations (words that often appear consecutively) within corpora. They may also be used to find other associations between word occurrences.
Finding collocations requires first calculating the frequencies of words and their appearance in the context of other words. Often the collection of words will then requiring filtering to only retain useful content terms. Each ngram of words may then be scored according to some association measure, in order to determine the relative likelihood of each ngram being a collocation.
k
finds top k bigram.
minFreq
the minimum frequency of collocation.
text
input text.
returns
significant bigram collocations in descending order of likelihood ratio.

Definition Classes
Operators
def corpus(text: Seq[String]): SimpleCorpus

Creates an in-memory text corpus.
Creates an in-memory text corpus.
text
a set of text.

Definition Classes
Operators
val lancaster: LancasterStemmer { def apply(word: String): String }

The Paice/Husk Lancaster stemming algorithm.
The Paice/Husk Lancaster stemming algorithm. The stemmer is a conflation based iterative stemmer. The stemmer, although remaining efficient and easily implemented, is known to be very strong and aggressive. The stemmer utilizes a single table of rules, each of which may specify the removal or replacement of an ending.
def ngram(maxNGramSize: Int, minFreq: Int, text: String*): Seq[Seq[NGram]]

An Apiori-like algorithm to extract n-gram phrases.
An Apiori-like algorithm to extract n-gram phrases.
maxNGramSize
The maximum length of n-gram
minFreq
The minimum frequency of n-gram in the sentences.
text
input text.
returns
An array of sets of n-grams. The i-th entry is the set of i-grams.

Definition Classes
Operators
implicit def pimpString(string: String): PimpedString
val porter: PorterStemmer { def apply(word: String): String }

Porter's stemming algorithm.
Porter's stemming algorithm. The stemmer is based on the idea that the suffixes in the English language are mostly made up of a combination of smaller and simpler suffixes. This is a linear step stemmer. Specifically it has five steps applying rules within each step. Within each step, if a suffix rule matched to a word, then the conditions attached to that rule are tested on what would be the resulting stem, if that suffix was removed, in the way defined by the rule. Once a Rule passes its conditions and is accepted the rule fires and the suffix is removed and control moves to the next step. If the rule is not accepted then the next rule in the step is tested, until either a rule from that step fires and control passes to the next step or there are no more rules in that step whence control moves to the next step.
def postag(sentence: String): Array[PennTreebankPOS]

Part-of-speech taggers.
Part-of-speech taggers.
sentence
a sentence.
returns
the pos tags.

Definition Classes
Operators

Inherited from Operators

Inherited from AnyRef

Inherited from Any

Ungrouped