com.eharmony.aloha.dataset.vw.multilabel

VwDownsampledMultilabelRowCreator

final case class VwDownsampledMultilabelRowCreator[-A, K](allLabelsInTrainingSet: IndexedSeq[K], featuresFunction: FeatureExtractorFunction[A, Sparse], defaultNamespace: List[Int], namespaces: List[(String, List[Int])], normalizer: Option[(CharSequence) ⇒ CharSequence], positiveLabelsFunction: GenAggFunc[A, IndexedSeq[K]], classNs: Char, dummyClassNs: Char, numDownsampledNegLabels: Int, seedCreator: () ⇒ Long, includeZeroValues: Boolean = false) extends StatefulRowCreator[A, Array[String], Long] with Logging with Product with Serializable

Creates training data for multilabel models in Vowpal Wabbit's CSOAA LDF and WAP LDF format for the JNI. In this row creator, negative labels are downsampled and costs for the downsampled labels are adjusted to produced an unbiased estimator. It is assumed that negative labels are in the majority. Downsampling negatives can improve both training time and possibly model performance. See the following resources for intuition:

This row creator, since it is stateful, requires the caller to maintain state. If however, it is only called via an iterator or sequence, then this row creator can maintain the state during iteration over the iterator or sequence. In the case of iterators, the mapping is non-strict and in the case of sequences (Seq), it is strict.

A

the input type

K

the label or class type

allLabelsInTrainingSet

all labels in the training set. This is a sequence because order matters. Order here can be chosen arbitrarily, but it must be consistent in the training and test formulation.

featuresFunction

features to extract from the data of type A.

defaultNamespace

list of feature indices in the default VW namespace.

namespaces

a mapping from VW namespace name to feature indices in that namespace.

normalizer

can modify VW output (currently unused)

positiveLabelsFunction

A method that can extract positive class labels.

classNs

the namespace name for class information.

dummyClassNs

the namespace name for dummy class information. 2 dummy classes are added to make the predicted probabilities work.

numDownsampledNegLabels

a positive value representing the number of negative labels to include in each row. If this is less than the number of negative examples for a given row, then no downsampling of negatives will take place.

seedCreator

a "function" that creates a seed that will be used for randomness. The implementation of this function is important. It should create a unique value for each unit of parallelism. If for example, row creation is parallelized across multiple threads on one machine, the unit of parallelism is threads and seedCreator should produce unique values for each thread. If row creation is parallelized across multiple machines, the seedCreator should produce a unique value for each machine. If row creation is parallelized across machines and threads on each machine, the seedCreator should create unique values for each thread on each machine. Otherwise, randomness will be striped which is bad.

includeZeroValues

include zero values in VW input?

Since

11/6/2017

Linear Supertypes
Product, Equals, Logging, StatefulRowCreator[A, Array[String], Long], Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. VwDownsampledMultilabelRowCreator
  2. Product
  3. Equals
  4. Logging
  5. StatefulRowCreator
  6. Serializable
  7. Serializable
  8. AnyRef
  9. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new VwDownsampledMultilabelRowCreator(allLabelsInTrainingSet: IndexedSeq[K], featuresFunction: FeatureExtractorFunction[A, Sparse], defaultNamespace: List[Int], namespaces: List[(String, List[Int])], normalizer: Option[(CharSequence) ⇒ CharSequence], positiveLabelsFunction: GenAggFunc[A, IndexedSeq[K]], classNs: Char, dummyClassNs: Char, numDownsampledNegLabels: Int, seedCreator: () ⇒ Long, includeZeroValues: Boolean = false)

    allLabelsInTrainingSet

    all labels in the training set. This is a sequence because order matters. Order here can be chosen arbitrarily, but it must be consistent in the training and test formulation.

    featuresFunction

    features to extract from the data of type A.

    defaultNamespace

    list of feature indices in the default VW namespace.

    namespaces

    a mapping from VW namespace name to feature indices in that namespace.

    normalizer

    can modify VW output (currently unused)

    positiveLabelsFunction

    A method that can extract positive class labels.

    classNs

    the namespace name for class information.

    dummyClassNs

    the namespace name for dummy class information. 2 dummy classes are added to make the predicted probabilities work.

    numDownsampledNegLabels

    a positive value representing the number of negative labels to include in each row. If this is less than the number of negative examples for a given row, then no downsampling of negatives will take place.

    seedCreator

    a "function" that creates a seed that will be used for randomness. The implementation of this function is important. It should create a unique value for each unit of parallelism. If for example, row creation is parallelized across multiple threads on one machine, the unit of parallelism is threads and seedCreator should produce unique values for each thread. If row creation is parallelized across multiple machines, the seedCreator should produce a unique value for each machine. If row creation is parallelized across machines and threads on each machine, the seedCreator should create unique values for each thread on each machine. Otherwise, randomness will be striped which is bad.

    includeZeroValues

    include zero values in VW input?

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. val allLabelsInTrainingSet: IndexedSeq[K]

    all labels in the training set.

    all labels in the training set. This is a sequence because order matters. Order here can be chosen arbitrarily, but it must be consistent in the training and test formulation.

  7. def apply(a: A, seed: Long): ((MissingAndErroneousFeatureInfo, Option[Array[String]]), Long)

    Given an a and some seed, produce output, including a new seed.

    Given an a and some seed, produce output, including a new seed.

    When using this function, the user is responsible for keeping track of, and providing the seeds.

    The implementation of this function should be referentially transparent.

    a

    input

    seed

    the random seed which is updated on each call.

    returns

    a tuple where the first element is a Tuple2 whose first element is missing and error information and second element is an optional result. The second element of the outer Tuple2 is the new state.

    Definition Classes
    VwDownsampledMultilabelRowCreatorStatefulRowCreator
  8. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  9. val classNs: Char

    the namespace name for class information.

  10. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. final def debug(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Issue a debug logging message, with an exception.

    Issue a debug logging message, with an exception.

    msg

    the message object. toString() is called to convert it to a loggable string.

    t

    the exception to include with the logged message.

    Attributes
    protected[this]
    Definition Classes
    Logging
  12. final def debug(msg: ⇒ Any): Unit

    Issue a debug logging message.

    Issue a debug logging message.

    msg

    the message object. toString() is called to convert it to a loggable string.

    Attributes
    protected[this]
    Definition Classes
    Logging
  13. val defaultNamespace: List[Int]

    list of feature indices in the default VW namespace.

  14. val dummyClassNs: Char

    the namespace name for dummy class information.

    the namespace name for dummy class information. 2 dummy classes are added to make the predicted probabilities work.

  15. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  16. final def error(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Issue a error logging message, with an exception.

    Issue a error logging message, with an exception.

    msg

    the message object. toString() is called to convert it to a loggable string.

    t

    the exception to include with the logged message.

    Attributes
    protected[this]
    Definition Classes
    Logging
  17. final def error(msg: ⇒ Any): Unit

    Issue a error logging message.

    Issue a error logging message.

    msg

    the message object. toString() is called to convert it to a loggable string.

    Attributes
    protected[this]
    Definition Classes
    Logging
  18. val featuresFunction: FeatureExtractorFunction[A, Sparse]

    features to extract from the data of type A.

  19. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  20. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  21. val includeZeroValues: Boolean

    include zero values in VW input?

  22. final def info(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Issue a info logging message, with an exception.

    Issue a info logging message, with an exception.

    msg

    the message object. toString() is called to convert it to a loggable string.

    t

    the exception to include with the logged message.

    Attributes
    protected[this]
    Definition Classes
    Logging
  23. final def info(msg: ⇒ Any): Unit

    Issue a info logging message.

    Issue a info logging message.

    msg

    the message object. toString() is called to convert it to a loggable string.

    Attributes
    protected[this]
    Definition Classes
    Logging
  24. lazy val initialState: Long

    Some initial state that can be used on the very first call to apply(A, S).

    Some initial state that can be used on the very first call to apply(A, S).

    returns

    some state.

    Definition Classes
    VwDownsampledMultilabelRowCreatorStatefulRowCreator
  25. final def isDebugEnabled: Boolean

    Determine whether debug logging is enabled.

    Determine whether debug logging is enabled.

    Attributes
    protected[this]
    Definition Classes
    Logging
  26. final def isErrorEnabled: Boolean

    Determine whether error logging is enabled.

    Determine whether error logging is enabled.

    Attributes
    protected[this]
    Definition Classes
    Logging
  27. final def isInfoEnabled: Boolean

    Determine whether info logging is enabled.

    Determine whether info logging is enabled.

    Attributes
    protected[this]
    Definition Classes
    Logging
  28. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  29. final def isTraceEnabled: Boolean

    Determine whether trace logging is enabled.

    Determine whether trace logging is enabled.

    Attributes
    protected[this]
    Definition Classes
    Logging
  30. final def isWarnEnabled: Boolean

    Determine whether warn logging is enabled.

    Determine whether warn logging is enabled.

    Attributes
    protected[this]
    Definition Classes
    Logging
  31. final lazy val logger: Logger

    The logger is a @transient lazy val to enable proper working with Spark.

    The logger is a @transient lazy val to enable proper working with Spark. The logger will not be serialized with the rest of the class with which this trait is mixed-in.

    Attributes
    protected[this]
    Definition Classes
    Logging
  32. def loggerInitName(): String

    The name with which the logger is initialized.

    The name with which the logger is initialized. This can be overridden in a derived class.

    returns

    Attributes
    protected
    Definition Classes
    Logging
  33. final def loggerName: String

    Get the name associated with this logger.

    Get the name associated with this logger.

    returns

    the name.

    Attributes
    protected[this]
    Definition Classes
    Logging
  34. val namespaces: List[(String, List[Int])]

    a mapping from VW namespace name to feature indices in that namespace.

  35. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  36. val normalizer: Option[(CharSequence) ⇒ CharSequence]

    can modify VW output (currently unused)

  37. final def notify(): Unit

    Definition Classes
    AnyRef
  38. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  39. val numDownsampledNegLabels: Int

    a positive value representing the number of negative labels to include in each row.

    a positive value representing the number of negative labels to include in each row. If this is less than the number of negative examples for a given row, then no downsampling of negatives will take place.

  40. val positiveLabelsFunction: GenAggFunc[A, IndexedSeq[K]]

    A method that can extract positive class labels.

  41. val seedCreator: () ⇒ Long

    a "function" that creates a seed that will be used for randomness.

    a "function" that creates a seed that will be used for randomness. The implementation of this function is important. It should create a unique value for each unit of parallelism. If for example, row creation is parallelized across multiple threads on one machine, the unit of parallelism is threads and seedCreator should produce unique values for each thread. If row creation is parallelized across multiple machines, the seedCreator should produce a unique value for each machine. If row creation is parallelized across machines and threads on each machine, the seedCreator should create unique values for each thread on each machine. Otherwise, randomness will be striped which is bad.

  42. def statefulMap[In <: Seq[A], Out](as: SeqLike[A, In], state: Long)(implicit cbf: CanBuildFrom[In, ((MissingAndErroneousFeatureInfo, Option[Array[String]]), Long), Out]): Out

    Apply the apply(A, S) method to the elements of the sequence.

    Apply the apply(A, S) method to the elements of the sequence. In the first application of apply(A, S), state will be used as the state. In subsequent applications, the state will come from the state generated in the output of the previous application of apply(A, S).

    NOTE: This method isn't really parallelizable via chunking. The way to parallelize this method is to provide a separate starting state for each unit of parallelism.

    For more information, see com.eharmony.aloha.util.StatefulMapOps

    as

    input to map.

    state

    the initial state to use at the start of mapping.

    cbf

    object responsible for building the output collection.

    returns

    Definition Classes
    StatefulRowCreator
  43. def statefulMap(as: Iterator[A], state: Long): Iterator[((MissingAndErroneousFeatureInfo, Option[Array[String]]), Long)]

    Apply the apply(A, S) method to the elements of the iterator.

    Apply the apply(A, S) method to the elements of the iterator. In the first application of apply(A, S), state will be used as the state. In subsequent applications, the state will come from the state generated in the output of the previous application of apply(A, S).

    For more information, see com.eharmony.aloha.util.StatefulMapOps

    as

    Note the first element of as will be forced in this method in order to construct the output.

    state

    the initial state to use at the start of the iterator.

    returns

    an iterator containing the a mapped to a (MissingAndErroneousFeatureInfo, Option[B]) along with the resulting state that is created in the process.

    Definition Classes
    StatefulRowCreator
  44. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  45. final def trace(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Issue a trace logging message, with an exception.

    Issue a trace logging message, with an exception.

    msg

    the message object. toString() is called to convert it to a loggable string.

    t

    the exception to include with the logged message.

    Attributes
    protected[this]
    Definition Classes
    Logging
  46. final def trace(msg: ⇒ Any): Unit

    Issue a trace logging message.

    Issue a trace logging message.

    msg

    the message object. toString() is called to convert it to a loggable string.

    Attributes
    protected[this]
    Definition Classes
    Logging
  47. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  48. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  49. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  50. final def warn(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Issue a warn logging message, with an exception.

    Issue a warn logging message, with an exception.

    msg

    the message object. toString() is called to convert it to a loggable string.

    t

    the exception to include with the logged message.

    Attributes
    protected[this]
    Definition Classes
    Logging
  51. final def warn(msg: ⇒ Any): Unit

    Issue a warn logging message.

    Issue a warn logging message.

    msg

    the message object. toString() is called to convert it to a loggable string.

    Attributes
    protected[this]
    Definition Classes
    Logging

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from StatefulRowCreator[A, Array[String], Long]

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped