com.eharmony.aloha.models.reg

PolynomialEvaluationAlgo

trait PolynomialEvaluationAlgo extends AnyRef

An algorithm for efficiently evaluating polynomials at a given point. Evaluating first order polynomials is obviously a sub case, which is important because first order polynomial evaluation is isomorphic to linear regression, which may be the standard use case.

As an example, imagine that we wanted to evaluate Z(u,v,x,y) = wu,vuv + wu,v,xuvx + wu,v,yuvy for coefficients W = [wu,v, wu,v,x, wu,v,y]T.

This is:

That Z can be factored indicates there is a way to structure the algorithm to efficiently reuse computations. The way to achieve this is to structure the possible polynomials as a tree and traverse and aggregate over the tree. As the tree is traversed, the computation is accumulated. As a concrete example, let's take the above example and show it using real code. Then a motivation of the example will be provided.

The computation tree works as follows: the edge labels are multiplied by the associated coefficient (0 if non-existent) to get the node values. Node values are added together to get the inner product. So, every time we descend farther into the tree, we multiply the running product by the value we extract from the input vector X and every time a weight is found, it is multiplied by the current product and added to the running sum. The process recurses until the tree can no longer by traversed. The sum is then returned.

//           u            u      v                                u     v     x
//    (1)*1.00      (1*1.00)*1.000        u     v    w1     (1*1.00*1.000)*0.75        u     v    x      w2
//   ----------> 0 ----------------> 1*1.00*1.000 * 0.5 ------------------------> 1*1.00*1.000*0.75 * 0.111
//                                                      \
//                                                       \        u     v     y
//                                                        \ (1*1.00*1.000)*0.25        u     v    y       w3
//                                                         ---------------------> 1*1.00*1.000*0.25 * 0.4545
//
//         u *     v *  w1    +       u *     v *    x *    w2   +       u *     v *    y *     w3
val Z = 1.00 * 1.000 * 0.5    +    1.00 * 1.000 * 0.75 * 0.111   +    1.00 * 1.000 * 0.25 * 0.4545

val X = IndexedSeq(
          Seq(("a=1", 1.00)),                  // u
          Seq(("b=1", 1.000)),                 // v
          Seq(("c=1", 0.75), ("c=2", 0.25)))   // x and y, respectively

val W1 =
  PolynomialEvaluator(Coefficient(0, IndexedSeq(0)), Map(
    "a=1" -> PolynomialEvaluator(Coefficient(0, IndexedSeq(1)), Map(
      "b=1" -> PolynomialEvaluator(Coefficient(0.5, IndexedSeq(2)), Map(   // w1
        "c=1" -> PolynomialEvaluator(Coefficient(0.111)),                  // w2
        "c=2" -> PolynomialEvaluator(Coefficient(0.4545))))))))            // w3


assert(Z == (W1 dot X))

While constructing a PolynomialEvaluator via direct means is entirely possible, it is less straightforward than using a builder to do it. Below, we show a better way to construct PolynomialEvaluator instances where we just specify the terms in the polynomial and the associated coefficient values. Note linear regression is the special case when all of the inner maps contain exactly one element.

val W2 = (PolynomialEvaluator.builder ++= Map(
  Map("a=1" -> 0, "b=1" -> 1            ) -> 0.5,
  Map("a=1" -> 0, "b=1" -> 1, "c=1" -> 2) -> 0.111,
  Map("a=1" -> 0, "b=1" -> 1, "c=2" -> 2) -> 0.4545
)).result

assert(W2 == W1)

Notice the values in the inner map look a little weird. These are the indices into the input vector x from which the key comes. This is for efficiency purposes but allows the algorithm to dramatically prune the search space while accumulating over the tree.

Self Type
PolynomialEvaluationAlgo with MapTreeLike[String, Coefficient]
Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. PolynomialEvaluationAlgo
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def at(x: IndexedSeq[Iterable[(String, Double)]]): Double

    Evaluate the polynomial represented by this object at the point x.

    Evaluate the polynomial represented by this object at the point x.

    x

    A representation of a point. The representation is as follows: The outer indexed sequence represents groups of features in feature space generated by the same feature expander. The inner sequence is an iterable sequence of key value pairs. This representation is used because it allows efficient encoding of sparse feature spaces. For instance, we can very easily encode multi-part bag of word models as follows. Let's say the 0th index in the outer sequence represents the title of HTML documents and the 1st index represents text inside the body tags in the HTML document.

    case class Doc(title: String, body: String) {
      def multiPartBagOfWords = IndexedSeq(tokens(title), tokens(body))
      private[this] def tokens(s: String) = s.trim.toLowerCase.split("\\W+").groupBy(identity).mapValues(_.size.toDouble)
    }
    
    val fox = Doc("fox story", "The quick brown fox jumps over the lazy dog")
    val p: PolynomialEvaluationAlgo = ...
    p at fox.multiPartBagOfWords
    returns

    this polynomial evaluated at x.

  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  13. final def hBig(x: IndexedSeq[Iterable[(String, Double)]], s: Stack[(MapTreeLike[String, Coefficient], Double)], sum: Double): Double

    A tail-recursive variant of hSmall that won't stack overflow on deep trees.

    A tail-recursive variant of hSmall that won't stack overflow on deep trees. This function is slower in empirical tests that hSmall.

    x

    an input vector

    s

    a stack of trees and current sums at those trees (replaces call stack in non-tail-recursive hSmall).

    sum

    the current sum.

    returns

    this polynomial evaluated at x.

    Attributes
    protected[this]
  14. final def hSmall(x: IndexedSeq[Iterable[(String, Double)]], xSize: Int, t: MapTreeLike[String, Coefficient], prod: Double): Double

    Accumulate, in a depth-first way, the evaluation of the polynomial.

    Accumulate, in a depth-first way, the evaluation of the polynomial.

    NOTE: This algorithm is recursive (but not tail-recursive) by seems to be about 20% faster than the following tail-recursive equivalent. This is probably due to additional object creation in the tail-recursive method. Tuple2 instances must be created and the linked list containers probably needs to be created. In the non-tail recursive version, we don't do any of this and only perform arithmetic operations (aside from the iterators and the Options created in the descendant lookup).

    We aren't too afraid of stack overflows because these trees will typically be shallow. This is because most use cases don't involve polynomials in thousands of variables (or of degree in the thousands). That being said, stack overflows are a real possibility (in test at a depth of ~3000). To combat this, hBig is provided but not yet integrated into the at function.

    import collection.mutable.{Stack => MStack}
    def h3(x: IndexedSeq[Seq[(String, Double)]], s: MStack[(MapTreeLike[String, Value], Double)], z: Double): Double = {
      if (s.isEmpty) z
      else {
        val h = s.pop
        for (i <- h._1.value.applicableFeatures; p <- x(i); c <- h._1.descendants.get(p._1)) s.push((c, p._2 * h._2))
        h3(x, s, z + h._1.value.weight * h._2)
      }
    }
    x

    an input vector

    xSize

    size of input vector (computed once for possible speed improvement).

    t

    a tree containing the information necessary to compute the higher order inner product

    prod

    product of the items in the path from the root to leaf.

    returns

    this polynomial evaluated at x.

    Attributes
    protected[this]
  15. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  17. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  20. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  21. def toString(): String

    Definition Classes
    AnyRef → Any
  22. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped