Class

com.getjenny.manaus.util

Bags

Related Doc: package util

Permalink

case class Bags(bags: List[(List[String], Set[String])]) extends Product with Serializable

Given bags of (key)words, computes most important 2-tuples and 3-tuples

Makes 4 co-location matrices for bigrams: m11(a, b): How often both terms "a" and "b" appear m10(a, b): "a" appears, "b" doesn't m10(a, b) = m01(b, a) m00(a, b) = none appears

NB the keys are sometimes Set (m11 and m00), sometimes tuples (m10).

Similarly for tri-grams there are m111, m110, m100, m000.

The matrices above are used to compute:

* llrSignificativeBigrams: significative bigrams with the log-likelihood Score * binomialSignificativeBigrams: significative bigrams measuring a higher-than expected frequency of one word compared to the other * trinomialSignificativeTrigrams:

Created by Mario Alemi on 12/04/2017 in Jutai, Amazonas, Brazil

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Bags
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Bags(bags: List[(List[String], Set[String])])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. val bags: List[(List[String], Set[String])]

    Permalink
  6. val bigramWord: List[(Set[String], String)]

    Permalink
  7. val binomialSignificativeBigrams: List[(List[String], Double)]

    Permalink

    Here, we consider "A" more freq than "B".

    Here, we consider "A" more freq than "B". We then consider occurrence(A) as the samples, and A && B as the success.

    This gives an approximated result of binomialSignificativeBigrams

  8. val binomialSignificativeBigramsExact: List[(Set[String], Double)]

    Permalink

    Here we compute the surprise in a more rigorous way.

    Here we compute the surprise in a more rigorous way. Consider the bigram(A, B). We construct three events:

    None=1-(1-P(A))*(1-P(B)), JustOne=P(A)+P(B)-2*P(A)*P(B), Bigram=P(A)*P(B).

    We have the total number bigrams (nBigrams), we can then build "None" with M00, "JustOne" with M10+M01, and "Bigram" with M11.

    Let's compute the surprise for that number of Bigrams.

  9. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. def llr2(bigram: Set[String]): Double

    Permalink

    The LLR Score for Bigrams

  15. val llr2Matrix: Map[Set[String], Double]

    Permalink
  16. val llrSignificativeBigrams: List[(Set[String], Double)]

    Permalink
  17. def m00(bigram: Set[String]): Int

    Permalink
  18. def m000(t: List[String]): Int

    Permalink
  19. val m10: Map[(String, String), Int]

    Permalink

    How often the first word appear, and the second word doesn't, in all bags? NB Keys of the map here are not Set2, but Tuples2.

    How often the first word appear, and the second word doesn't, in all bags? NB Keys of the map here are not Set2, but Tuples2. Of course, m10(a, b) = m01(b, a)

  20. val m100: Map[(String, String, String), Int]

    Permalink
  21. val m11: Map[Set[String], Int]

    Permalink
  22. val m110: Map[(String, String, String), Int]

    Permalink
  23. val m111: Map[Set[String], Int]

    Permalink
  24. val n: Int

    Permalink
  25. val nBigrams: Int

    Permalink
  26. val nTrigrams: Int

    Permalink
  27. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  28. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  29. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  30. val occurrences: Map[String, Int]

    Permalink
  31. val orderedBigrams: List[List[String]]

    Permalink
  32. val orderedTrigrams: List[List[String]]

    Permalink
  33. val sb: List[Set[String]]

    Permalink
  34. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  35. val trigrams: Set[Set[String]]

    Permalink
  36. def trinomialSignificativeTrigrams(): Map[Set[String], Double]

    Permalink
  37. val vocabulary: Set[String]

    Permalink
  38. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped