Object

org.allenai.scienceparse.pipeline

Bucketizers

Related Doc: package pipeline

Permalink

object Bucketizers

This contains a bunch of helper functions stolen from the pipeline code. We need it here to anticipate how well the pipeline will work with the output from science-parse.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Bucketizers
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. val concatChar: String

    Permalink
  7. def cutoffFilter(b: String, cutoffOption: Option[Int], highFreqs: Map[String, Int]): Boolean

    Permalink
  8. val defaultAllowTruncated: Boolean

    Permalink
  9. val defaultNameCutoffThreshold: Int

    Permalink
  10. val defaultNameNgramLength: Int

    Permalink
  11. val defaultTitleCutoffThreshold: Int

    Permalink
  12. val defaultTitleNgramLength: Int

    Permalink
  13. val defaultUpto: Int

    Permalink
  14. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  16. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  17. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  18. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  19. val highFreqNameNgramStream: InputStream

    Permalink
  20. lazy val highFreqNameNgrams: Map[String, Int]

    Permalink
  21. val highFreqTitleNgramStream: InputStream

    Permalink

    This file contains 225 high-frequency n-grams from title prefixes.

    This file contains 225 high-frequency n-grams from title prefixes. High means the S2 * Dblp bucket size is > 1M. (Early Sept. 2015) n is 2, 3, 4, 5.

  22. lazy val highFreqTitleNgrams: Map[String, Int]

    Permalink
  23. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  24. def loadHighFreqs(is: InputStream): Map[String, Int]

    Permalink
  25. def nameNgrams(name: String): Iterator[String]

    Permalink
  26. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  27. def ngramAux(chunks: Array[String], n: Int, cutoffOption: Option[Int], allowTruncated: Boolean, highFreqs: Map[String, Int], upto: Int): Iterator[String]

    Permalink
  28. def ngrams(text: String, n: Int, cutoffOption: Option[Int], allowTruncated: Boolean = defaultAllowTruncated, highFreqs: Map[String, Int] = highFreqTitleNgrams, upto: Int = defaultUpto): Iterator[String]

    Permalink

    Returns a list of ngrams.

    Returns a list of ngrams. If cutoff is specified, continue to add more words until the result has frequency lower than the cutoff value. If allowTruncated is set to true, accept ngrams that have length less than n. For example, if the text is "local backbones" and n = 3, we will generate the ngram "local_backbones".

  29. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. def simple3TitlePrefix(text: String): List[String]

    Permalink

    This is used in V1.

  32. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  33. def tailNgrams(text: String, n: Int, cutoffOption: Option[Int], allowTruncated: Boolean = defaultAllowTruncated, highFreqs: Map[String, Int] = highFreqTitleNgrams, upto: Int = defaultUpto): Iterator[String]

    Permalink
  34. def titleNgrams(title: String, upto: Int, allowTruncated: Boolean = defaultAllowTruncated): Iterator[String]

    Permalink
  35. def titleTailNgrams(title: String, upto: Int = 1, allowTruncated: Boolean = defaultAllowTruncated): Iterator[String]

    Permalink
  36. def toBucket(s: String): String

    Permalink
  37. def toBucket(words: Iterable[String]): String

    Permalink
  38. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  39. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  41. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  42. def words(text: String, maxCount: Int = 40): Array[String]

    Permalink

    Return the array of tokens for the given input.

    Return the array of tokens for the given input. Limit number of tokens to maxCount

Inherited from AnyRef

Inherited from Any

Ungrouped