Extract noun phrases and their counts from a given body of text.
Extract noun phrases and their counts from a given body of text.
Inspired by http://dragon.ischool.drexel.edu/xtract.asp
Inspired by http://dragon.ischool.drexel.edu/xtract.asp
Using POS tags as described here
This function implements something akin to the xTract description, by doing the following:
$ - Sentence detection (to only pull phrases from individual sentences) $ - POS ("Part of Speech") Tagging, to allow processing only certain parts of speech (Nouns, etc.) $ - "Chunking" - to restrict phrase extraction to appropriate sub-sentence structures $ - Filtering chunks to "noun phrases" only $ - Extraction of words based on some simple rules on POS: $ - First word can be either Noun or Adjective $ - Select other Adjectives/Nouns within a threshold of the "first word" $ - From this selection, prepare n-grams such that the last word of N-Gram must be a Noun.
This class is known to be not threadsafe.