org.allenai.nlpstack.parse.poly.ml

DatastoreGoogleNGram

Related Docs: object DatastoreGoogleNGram | package ml

case class DatastoreGoogleNGram(groupName: String, artifactName: String, version: Int, frequencyCutoff: Int) extends Product with Serializable

A class that parses Google N-Gram data (http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html) to provide information about a requested n-gram. Takes the datastore location details for a data directory and parses each file, expected to be in the following format (from https://docs.google.com/document/d/14PWeoTkrnKk9H8_7CfVbdvuoFZ7jYivNTkBX2Hj7qLw/edit) - format: OFF head_word<TAB>syntactic-ngram<TAB>total_count<TAB>counts_by_year The counts_by_year format is a tab-separated list of year<comma>count items. Years are sorted in ascending order, and only years with non-zero counts are included. The syntactic-ngram format is a space-separated list of tokens, each token format is: “word/pos-tag/dep-label/head-index”. The word field can contain any non-whitespace character. The other fields can contain any non-whitespace character except for ‘/’. pos-tag is a Penn-Treebank part-of-speech tag. dep-label is a stanford-basic-dependencies label. head-index is an integer, pointing to the head of the current token. “1” refers to the first token in the list, 2 the second, and 0 indicates that the head is the root of the fragment. format: ON

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DatastoreGoogleNGram
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DatastoreGoogleNGram(groupName: String, artifactName: String, version: Int, frequencyCutoff: Int)

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. val artifactName: String

  5. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  6. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  8. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. val frequencyCutoff: Int

  10. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  11. val googleNgramDir: File

  12. val googleNgramPath: Path

  13. val groupName: String

  14. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  15. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  16. val ngramMap: Map[String, Seq[NgramInfo]]

  17. final def notify(): Unit

    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  20. val version: Int

  21. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped