Package

com.getjenny.manaus

util

Permalink

package util

Created by Mario Alemi on 06/04/2017.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. util
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class Bags(bags: List[(List[String], Set[String])]) extends Product with Serializable

    Permalink

    Given bags of (key)words, computes most important 2-tuples and 3-tuples

    Given bags of (key)words, computes most important 2-tuples and 3-tuples

    Makes 4 co-location matrices for bigrams: m11(a, b): How often both terms "a" and "b" appear m10(a, b): "a" appears, "b" doesn't m10(a, b) = m01(b, a) m00(a, b) = none appears

    NB the keys are sometimes Set (m11 and m00), sometimes tuples (m10).

    Similarly for tri-grams there are m111, m110, m100, m000.

    The matrices above are used to compute:

    * llrSignificativeBigrams: significative bigrams with the log-likelihood Score * binomialSignificativeBigrams: significative bigrams measuring a higher-than expected frequency of one word compared to the other * trinomialSignificativeTrigrams:

    Created by Mario Alemi on 12/04/2017 in Jutai, Amazonas, Brazil

  2. case class Binomial(samples: Long, successes: Double) extends Product with Serializable

    Permalink

    Builds a Binomial prior.

    Builds a Binomial prior. Successes should be Int, but are put Double for more flexibility.

    Created by Mario Alemi on 06/04/2017.

  3. case class Trinomial(samples: Int, successes: List[Double]) extends Product with Serializable

    Permalink

    Trinomial distribution

    Trinomial distribution

    Created by Mario Alemi on 16/04/2017 in Manaus, Amazonas, Brazil.

Value Members

  1. def bigramSet2tuple(bigram: Set[String]): (String, String)

    Permalink
  2. def bigramSet2tupleInverted(bigram: Set[String]): (String, String)

    Permalink
  3. def binomialFactor(n: Int, k: Int): Double

    Permalink
  4. def expectedTriOccurrence(n: Int, k1: Int, k2: Int): List[Double]

    Permalink

    Given the occurrences of two words k1 and k2 in a sample of n bigrams, makes the expected relative frequencies:

    Given the occurrences of two words k1 and k2 in a sample of n bigrams, makes the expected relative frequencies:

    None/n=1-(1-P(1))*(1-P(2)), JustOne/n=P(1)+P(2)-2*P(1)*P(2), Bigram/n=P(1)*P(2)

  5. val factorial: Map[Int, Double]

    Permalink
  6. def multinomialFactor(n: Int, k: List[Int]): Double

    Permalink
  7. def splitSentences(line: String): List[((String, String, List[String]), Int)]

    Permalink

    Ad-hoc tokenizer for our (private) test data.

    Ad-hoc tokenizer for our (private) test data.

    line

    A string with the conversation in this format: """ "CLIENT: I want to renew a subscription...";"AGENT: Sure, tell me your name..."\n """

    returns

    List(List("CLIENT", "I want to renew a subscription..."), List("AGENT", "Sure, tell me your name..."))

Inherited from AnyRef

Inherited from Any

Ungrouped