Bags

Given bags of (key)words, computes most important 2-tuples and 3-tuples

Makes 4 co-location matrices for bigrams: m11(a, b): How often both terms "a" and "b" appear m10(a, b): "a" appears, "b" doesn't m10(a, b) = m01(b, a) m00(a, b) = none appears

NB the keys are sometimes Set (m11 and m00), sometimes tuples (m10).

Similarly for tri-grams there are m111, m110, m100, m000.

The matrices above are used to compute:

* llrSignificativeBigrams: significative bigrams with the log-likelihood Score * binomialSignificativeBigrams: significative bigrams measuring a higher-than expected frequency of one word compared to the other * trinomialSignificativeTrigrams:

Created by Mario Alemi on 12/04/2017 in Jutai, Amazonas, Brazil

Linear Supertypes

Serializable, Serializable, Product, Equals, AnyRef, Any

Instance Constructors

new Bags(bags: List[(List[String], Set[String])])

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
val bags: List[(List[String], Set[String])]
val bigramWord: List[(Set[String], String)]
val binomialSignificativeBigrams: List[(List[String], Double)]

Here, we consider "A" more freq than "B".
Here, we consider "A" more freq than "B". We then consider occurrence(A) as the samples, and A && B as the success.
This gives an approximated result of binomialSignificativeBigrams
val binomialSignificativeBigramsExact: List[(Set[String], Double)]

Here we compute the surprise in a more rigorous way.
Here we compute the surprise in a more rigorous way. Consider the bigram(A, B). We construct three events:
None=1-(1-P(A))*(1-P(B)), JustOne=P(A)+P(B)-2*P(A)*P(B), Bigram=P(A)*P(B).
We have the total number bigrams (nBigrams), we can then build "None" with M00, "JustOne" with M10+M01, and "Bigram" with M11.
Let's compute the surprise for that number of Bigrams.
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def llr2(bigram: Set[String]): Double

The LLR Score for Bigrams
val llr2Matrix: Map[Set[String], Double]
val llrSignificativeBigrams: List[(Set[String], Double)]
def m00(bigram: Set[String]): Int
def m000(t: List[String]): Int
val m10: Map[(String, String), Int]

How often the first word appear, and the second word doesn't, in all bags? NB Keys of the map here are not Set2, but Tuples2.
How often the first word appear, and the second word doesn't, in all bags? NB Keys of the map here are not Set2, but Tuples2. Of course, m10(a, b) = m01(b, a)
val m100: Map[(String, String, String), Int]
val m11: Map[Set[String], Int]
val m110: Map[(String, String, String), Int]
val m111: Map[Set[String], Int]
val n: Int
val nBigrams: Int
val nTrigrams: Int
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val occurrences: Map[String, Int]
val orderedBigrams: List[List[String]]
val orderedTrigrams: List[List[String]]
val sb: List[Set[String]]
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
val trigrams: Set[Set[String]]
def trinomialSignificativeTrigrams(): Map[Set[String], Double]
val vocabulary: Set[String]
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package util

case class Bags(bags: List[(List[String], Set[String])]) extends Product with Serializable

Instance Constructors

new Bags(bags: List[(List[String], Set[String])])

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

val bags: List[(List[String], Set[String])]

val bigramWord: List[(Set[String], String)]

val binomialSignificativeBigrams: List[(List[String], Double)]

val binomialSignificativeBigramsExact: List[(Set[String], Double)]

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def finalize(): Unit

final def getClass(): Class[_]

final def isInstanceOf[T0]: Boolean

def llr2(bigram: Set[String]): Double

val llr2Matrix: Map[Set[String], Double]

val llrSignificativeBigrams: List[(Set[String], Double)]

def m00(bigram: Set[String]): Int

def m000(t: List[String]): Int

val m10: Map[(String, String), Int]

val m100: Map[(String, String, String), Int]

val m11: Map[Set[String], Int]

val m110: Map[(String, String, String), Int]

val m111: Map[Set[String], Int]

val n: Int

val nBigrams: Int

val nTrigrams: Int

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val occurrences: Map[String, Int]

val orderedBigrams: List[List[String]]

val orderedTrigrams: List[List[String]]

val sb: List[Set[String]]

final def synchronized[T0](arg0: ⇒ T0): T0

val trigrams: Set[Set[String]]

def trinomialSignificativeTrigrams(): Map[Set[String], Double]

val vocabulary: Set[String]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped