This file contains 225 high-frequency n-grams from title prefixes.
This file contains 225 high-frequency n-grams from title prefixes. High means the S2 * Dblp bucket size is > 1M. (Early Sept. 2015) n is 2, 3, 4, 5.
Returns a list of ngrams.
Returns a list of ngrams. If cutoff is specified, continue to add more words until the result has frequency lower than the cutoff value. If allowTruncated is set to true, accept ngrams that have length less than n. For example, if the text is "local backbones" and n = 3, we will generate the ngram "local_backbones".
This is used in V1.
Return the array of tokens for the given input.
Return the array of tokens for the given input. Limit number of tokens to maxCount
This contains a bunch of helper functions stolen from the pipeline code. We need it here to anticipate how well the pipeline will work with the output from science-parse.