Object

com.salesforce.op.utils.stats

OpStatistics

Related Doc: package stats

Permalink

object OpStatistics

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. OpStatistics
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class ChiSquaredResults(cramersV: Double, chiSquaredStat: Double, pValue: Double) extends Product with Serializable

    Permalink

    Case class for holding results of the Chi-squared statistical test we use for calculating Cramer's V

    Case class for holding results of the Chi-squared statistical test we use for calculating Cramer's V

    cramersV

    Cramer's V value

    chiSquaredStat

    Actual Chi-squared statistic

    pValue

    P-value

  2. case class ConfidenceResults(maxConfidences: Array[Double], supports: Array[Double]) extends Product with Serializable

    Permalink

    Container for association rule confidence and supports

    Container for association rule confidence and supports

    maxConfidences

    Array of maximum confidence values, one per contingency matrix row

    supports

    Array of support values for each categorical value, one per contingency matrix row

  3. case class ContingencyStats(chiSquaredResults: ChiSquaredResults, pointwiseMutualInfo: Type, contingencyMatrix: Type, mutualInfo: Double, confidenceResults: ConfidenceResults) extends Product with Serializable

    Permalink

    Container class for statistics calculated from contingency matrices constructed from categorical variables

    Container class for statistics calculated from contingency matrices constructed from categorical variables

    chiSquaredResults

    Chi-squared test results for the given contingency matrix

    pointwiseMutualInfo

    Map between feature name in feature vector and map of pointwise mutual information values between that feature and all values the label can take

    contingencyMatrix

    Actual (unfiltered) contingency matrix that the rest of the results are calculated from

    mutualInfo

    Map between feature name in feature vector and the mutual information with the label

    confidenceResults

    Association rule details (confidences + supports)

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. object LabelWiseValues

    Permalink

    Two-element result tuple containing a map of labels to values which is used for eg.

    Two-element result tuple containing a map of labels to values which is used for eg. pointwise mutual information or the contingency matrix itself.

  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def computeCorrelationsWithLabel(featuresAndLabel: RDD[Vector], colStats: MultivariateStatisticalSummary, numOfRows: Long): Array[Double]

    Permalink

    Assumes that we have already computed a MultivariateStatisticsSummary on the RDD, so we can use that info here.

    Assumes that we have already computed a MultivariateStatisticsSummary on the RDD, so we can use that info here. This defines an RDD aggregation that calculates all the correlations with the label. Data is assumed to be laid out in an RDD[org.apache.spark.mllib.linalg.Vector] where the label is the last element.

    featuresAndLabel

    Input RDD consisting of a single array containing the feature vector with the label as the last element

    returns

    Array of correlations of each feature vector element with the label

  8. def contingencyStats(contingency: Matrix): ContingencyStats

    Permalink

    Calculates all of the statistics we use that come from contingency matrices between categorical features and categorical labels and stores them in a ContingencyStats case class.

    Calculates all of the statistics we use that come from contingency matrices between categorical features and categorical labels and stores them in a ContingencyStats case class.

    contingency

    Matrix of co-occurrences of feature values with label values. Each row represents a different feature choice, while each column represents a different label value.

    returns

    ContingencyStats object containing all the statistics we calculate from contingency matrices

  9. def contingencyStatsFromMultiPickList(contingency: Matrix, labelCounts: Array[Double]): ContingencyStats

    Permalink

    Same as contingencyStats method, but specialized to MultiPickLists.

    Same as contingencyStats method, but specialized to MultiPickLists. The standard contingency table stats are not technically valid for MultiPickLists because the choices are not independent from each other (multipicklists are multi-hot encoded instead of one-hot encoded).

    There are several strategies to deal with this to calculate statistics similar to Cramer's V. We follow https://cran.r-project.org/web/packages/MRCV/vignettes/MRCV-vignette.pdf for inspiration, but use a slightly different scheme where we compute stats from a 2 x numLabels contingency matrix for each choice separately, and take the max of these Cramer's V values (one per choice) as the Cramer's V value for the entire MultiPickList. See BadFeatureZooTest for testing how this performs on different types of relations between MultiPickLists and the label.

    contingency

    Matrix of co-occurrences of feature values with label values. Each row represents a different feature choice, while each column represents a different label value.

    labelCounts

    Array of counts of each label, used to construct the 2 x numLabels contingency matrices for each choice

    returns

    ContingencyStats object containing all the statistics we calculate from contingency matrices

  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  16. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  20. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped