Package

com.salesforce.op.stages.impl

preparators

Permalink

package preparators

Visibility
  1. Public
  2. All

Type Members

  1. case class CategoricalGroupStats(group: String, categoricalFeatures: Array[String], contingencyMatrix: Type, pointwiseMutualInfo: Type, cramersV: Double, mutualInfo: Double, maxRuleConfidences: Array[Double], supports: Array[Double]) extends MetadataLike with Product with Serializable

    Permalink

    Container for categorical stats coming from a single group (and therefore a single contingency matrix)

    Container for categorical stats coming from a single group (and therefore a single contingency matrix)

    group

    Indicator group for this contingency matrix

    categoricalFeatures

    Array of categorical features belonging to this group

    contingencyMatrix

    Contingency matrix for this feature group

    pointwiseMutualInfo

    Matrix of PMI values in Map form (label -> PMI values)

    cramersV

    Cramer's V value for this feature group (how strongly correlated is it with the label)

    mutualInfo

    Mutual info value for this feature group

    maxRuleConfidences

    Array (one value per contingency matrix row) containing the largest association rule confidence for that row (over all the labels)

    supports

    Array (one value per contingency matrix row) containing the supports for each categorical choice (fraction of dats in which it is chosen)

  2. sealed trait CorrelationExclusion extends EnumEntry with Serializable

    Permalink

    Categories of feature vector columns to exclude from the feature-label correlation matrix (or just array of feature-label correlations) calculated inSanityChecker.

  3. sealed abstract class CorrelationType extends EnumEntry with Serializable

    Permalink

    Represents a kind of correlation coefficient.

  4. case class Correlations(featuresIn: Seq[String], values: Seq[Double], nanCorrs: Seq[String], corrType: CorrelationType) extends MetadataLike with Product with Serializable

    Permalink

    Correlations between features and the label from SanityChecker

    Correlations between features and the label from SanityChecker

    featuresIn

    names of features

    values

    correlation of feature with label

    nanCorrs

    nan correlation features

    corrType

    type of correlation done on

  5. class PredictionDeIndexer extends BinaryEstimator[RealNN, RealNN, Text] with SaveOthersParams

    Permalink

    Estimator which takes response feature and predinction feature as inputs.

    Estimator which takes response feature and predinction feature as inputs. It deindexes the pred by using response's metadata

    Input 1 : response Input 2 : pred feature

  6. final class PredictionDeIndexerModel extends BinaryModel[RealNN, RealNN, Text]

    Permalink
  7. class SanityChecker extends BinaryEstimator[RealNN, OPVector, OPVector] with SanityCheckerParams with AllowLabelAsInput[OPVector]

    Permalink

    The SanityChecker checks for potential problems with computed features in a supervised learning setting.

    The SanityChecker checks for potential problems with computed features in a supervised learning setting.

    There is an Estimator step, which outputs statistics on the incoming data, as well as the names of features which should be dropped from the feature vector. The transformer step applies the action of actually removing the offending features from the feature vector.

  8. final class SanityCheckerModel extends BinaryModel[RealNN, OPVector, OPVector] with AllowLabelAsInput[OPVector]

    Permalink
  9. trait SanityCheckerParams extends Params

    Permalink
  10. case class SanityCheckerSummary(correlationsWLabel: Correlations, dropped: Seq[String], featuresStatistics: SummaryStatistics, names: Seq[String], categoricalStats: Array[CategoricalGroupStats]) extends MetadataLike with Product with Serializable

    Permalink

    Case class to convert to and from SanityChecker summary metadata

    Case class to convert to and from SanityChecker summary metadata

    correlationsWLabel

    feature correlations with label

    dropped

    features dropped for label leakage

    featuresStatistics

    stats on features

    names

    names of features passed in

  11. case class SummaryStatistics(count: Double, sampleFraction: Double, max: Seq[Double], min: Seq[Double], mean: Seq[Double], variance: Seq[Double]) extends MetadataLike with Product with Serializable

    Permalink

    Statistics on features (zip arrays with names in SanityCheckerSummary to get feature associated with values)

    Statistics on features (zip arrays with names in SanityCheckerSummary to get feature associated with values)

    count

    count of data in sample used to calculate stats

    sampleFraction

    fraction of total data used in calculation

    max

    max value seen

    min

    min value

    mean

    mean value

    variance

    variance of value

  12. case class CategoricalStats(categoricalFeatures: Array[String] = Array.empty, cramersVs: Array[Double] = Array.empty, pointwiseMutualInfos: Type = LabelWiseValues.empty, mutualInfos: Array[Double] = Array.empty, counts: Type = LabelWiseValues.empty) extends MetadataLike with Product with Serializable

    Permalink

    Container class for statistics calculated from contingency tables constructed from categorical variables

    Container class for statistics calculated from contingency tables constructed from categorical variables

    categoricalFeatures

    Names of features that we performed categorical tests on

    cramersVs

    Values of cramersV for each feature (should be the same for everything coming from the same contingency matrix)

    pointwiseMutualInfos

    Map from label value (as a string) to an Array (over features) of PMI values

    mutualInfos

    Values of MI for each feature (should be the same for everything coming from the same contingency matrix)

    counts

    Counts of occurrence for categoricals (n x m array of arrays where n = number of labels and m = number of features + 1 with last element being occurrence count of labels

    Annotations
    @deprecated
    Deprecated

    (Since version 3.3.0) Functionality replaced by Array[CategoricalGroupStats]

Value Members

  1. object CorrelationExclusion extends Enum[CorrelationExclusion] with Serializable

    Permalink
  2. object CorrelationType extends Enum[CorrelationType] with Serializable

    Permalink
  3. object SanityChecker extends Serializable

    Permalink
  4. object SanityCheckerNames extends Product with Serializable

    Permalink

    Contains all names for sanity checker metadata

  5. object SanityCheckerSummary extends Product with Serializable

    Permalink

Ungrouped