Object

com.databricks.labs.automl.feature

FeatureEvaluator

Related Doc: package feature

Permalink

object FeatureEvaluator extends FeatureInteractionBase

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FeatureEvaluator
  2. FeatureInteractionBase
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final val AGGREGATE_COLUMN: String

    Permalink
    Definition Classes
    FeatureInteractionBase
  5. final val COUNT_COLUMN: String

    Permalink
    Definition Classes
    FeatureInteractionBase
  6. final val ENTROPY_COLUMN: String

    Permalink
    Definition Classes
    FeatureInteractionBase
  7. final val FIELD_ENTROPY_COLUMN: String

    Permalink
    Definition Classes
    FeatureInteractionBase
  8. final val INDEXED_SUFFIX: String

    Permalink
    Definition Classes
    FeatureInteractionBase
  9. final val QUANTILE_PRECISION: Double

    Permalink
    Definition Classes
    FeatureInteractionBase
  10. final val QUANTILE_THRESHOLD: Double

    Permalink
    Definition Classes
    FeatureInteractionBase
  11. final val RATIO_COLUMN: String

    Permalink
    Definition Classes
    FeatureInteractionBase
  12. final val TOTAL_RATIO_COLUMN: String

    Permalink
    Definition Classes
    FeatureInteractionBase
  13. final val VARIANCE_STATISTIC: String

    Permalink
    Definition Classes
    FeatureInteractionBase
  14. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  15. def calculateCategoricalInformationGain(df: DataFrame, labelColumn: String, fieldToTest: String, totalRecordCount: Long): Double

    Permalink

    Helper method for calculating the Information Gain of a feature field

    Helper method for calculating the Information Gain of a feature field

    df

    DataFrame that contains at least the fieldToTest and the Label Column

    fieldToTest

    The field to calculate Information Gain for

    totalRecordCount

    Total number of records in the data set

    returns

    The Information Gain of the field

    Since

    0.7.0

  16. def calculateCategoricalVariance(df: DataFrame, labelColumn: String, fieldToTest: String): Double

    Permalink

    Method for calculating the variance of a categorical (nominal) field based on a post-split first-layer variance of the label column's values to determine the minimum variance achievable in the label column.

    Method for calculating the variance of a categorical (nominal) field based on a post-split first-layer variance of the label column's values to determine the minimum variance achievable in the label column.

    df

    DataFrame that contains the label column and the field under test for minimum by-group variance

    labelColumn

    The label column of the data set

    fieldToTest

    The feature column to test for variance reduction

    returns

    The minimum split variance of the aggregated label data by nominal group of the fieldToTest

    Since

    0.7.0

  17. def calculateContinuousInformationGain(df: DataFrame, labelCol: String, fieldToTest: String, totalRecordCount: Long, bucketCount: Int): Double

    Permalink

    Helper method for handling Information Gain Calculation for classification data set when dealing with continuous (numeric) feature elements.

    Helper method for handling Information Gain Calculation for classification data set when dealing with continuous (numeric) feature elements. The continuous feature will be split upon the configured value of _continuousDiscretizerBucketCount, which is set by overriding .setContinuousDiscretizerBucketCount(<Int>)

    df

    DataFrame that contains the feature to test and the label column

    fieldToTest

    The feature field that is under test for entropy evaluation

    totalRecordCount

    Total number of elements in the data set.

    returns

    Information Gain associated with the feature field based on splits that could occur.

    Since

    0.7.0

  18. def calculateContinuousVariance(df: DataFrame, labelColumn: String, fieldToTest: String, bucketCount: Int): Double

    Permalink

    Method for calculating the variance of a continuous field for variance reduction in the label column based on bucketized grouping of the field under test.

    Method for calculating the variance of a continuous field for variance reduction in the label column based on bucketized grouping of the field under test.

    df

    DataFrame that contains the label column and the field under test of continuous numeric type

    labelColumn

    The label column of the data set

    fieldToTest

    The field to test (continuous numeric) that need to be evaluated

    bucketCount

    The number of quantized buckets to create to group the field under test into in order to simulate where a decision split would occur.

    returns

    The minimum split variance of each of the buckets that have been created

    Since

    0.7.0

  19. def calculatePercentageChange(before: Double, after: Double): Double

    Permalink

    Method for evaluating the percentage change to the score metric to normalize.

    Method for evaluating the percentage change to the score metric to normalize.

    before

    Score of a parent feature

    after

    Score of an interaction feature

    returns

    the percentage change

    Attributes
    protected[com.databricks.labs.automl.feature]
    Definition Classes
    FeatureInteractionBase
    Since

    0.6.2

  20. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. def discretizeContinuousFeature(df: DataFrame, fieldToTest: String, bucketCount: Int): DataFrame

    Permalink

    Helper method for converting a continuous feature to a discrete bucketed value so that entropy can be calculated effectively for the feature.

    Helper method for converting a continuous feature to a discrete bucketed value so that entropy can be calculated effectively for the feature.

    df

    DataFrame containing at least the field to test in continuous numeric format

    fieldToTest

    The name of the field under conversion

    returns

    A Dataframe with the continuous value converted to a quantized bucket membership value.

    Since

    0.7.0

  22. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  23. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  24. def extractAndValidateSchema(schema: StructType, featureVector: String): Unit

    Permalink

    Helper method for extracting field names and ensuring that the feature vector is present

    Helper method for extracting field names and ensuring that the feature vector is present

    schema

    Schema of the DataFrame undergoing feature interaction

    featureVector

    The name of the features column

    returns

    Array of column names of the DataFrame

    Since

    0.6.2

  25. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  26. def generateInteractionCandidates(featureColumns: Array[ColumnTypeData]): Array[InteractionPayload]

    Permalink

    Method for generating a collection of Interaction Candidates to be tested and applied to the feature set if the tests for inclusion pass.

    Method for generating a collection of Interaction Candidates to be tested and applied to the feature set if the tests for inclusion pass.

    featureColumns

    List of the columns that make up the feature vector

    returns

    Array of InteractionPayload values.

    Attributes
    protected[com.databricks.labs.automl.feature]
    Definition Classes
    FeatureInteractionBase
    Since

    0.6.2

  27. def generateNominalIndexesInteractionFields(payload: FeatureInteractionCollection): NominalDataCollection

    Permalink

    Method for converting nominal interaction fields to a new StringIndexed value to preserve information type and eliminate the possibility of data distribution skew

    Method for converting nominal interaction fields to a new StringIndexed value to preserve information type and eliminate the possibility of data distribution skew

    payload

    FeatureInteractionCollection of the source parents and their interacted children fields

    returns

    NominalDataCollecction payload containing a DataFrame that has new StringIndexed fields for nominal interactions and the fields that need to be seen as included in the final feature vector

    Attributes
    protected[com.databricks.labs.automl.feature]
    Definition Classes
    FeatureInteractionBase
    Since

    0.6.2

  28. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  29. def getFieldType(fieldType: String): structures.FieldEncodingType.Value

    Permalink
    Attributes
    protected[com.databricks.labs.automl.feature]
    Definition Classes
    FeatureInteractionBase
  30. def getModelType(modelingType: String): structures.ModelingType.Value

    Permalink
    Attributes
    protected[com.databricks.labs.automl.feature]
    Definition Classes
    FeatureInteractionBase
  31. def getRetentionMode(retentionMode: String): structures.InteractionRetentionMode.Value

    Permalink
    Attributes
    protected[com.databricks.labs.automl.feature]
    Definition Classes
    FeatureInteractionBase
  32. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  33. def interactProduct(df: DataFrame, candidate: InteractionPayload): DataFrame

    Permalink

    Method for generating a product interaction between feature columns

    Method for generating a product interaction between feature columns

    df

    A DataFrame to add a field for an interaction between two columns

    candidate

    InteractionPayload information about the two parent columns and the name of the new interaction column to be created.

    returns

    A modified DataFrame with the new column.

    Attributes
    protected[com.databricks.labs.automl.feature]
    Definition Classes
    FeatureInteractionBase
    Since

    0.6.2

  34. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  35. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  36. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  37. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  38. def regenerateFeatureVector(df: DataFrame, preInteractedFields: Array[String], interactedFields: Array[String], featureCol: String): VectorAssemblyOutput

    Permalink

    Helper method for recreating the feature vector after interactions have been completed on individual columns

    Helper method for recreating the feature vector after interactions have been completed on individual columns

    df

    DataFrame containing the interacted fields with the original feature vector dropped

    preInteractedFields

    Fields making up the original vector before interaction

    interactedFields

    Interaction candidate fields that have been selected to be included in the final feature vector

    featureCol

    Name of the feature vector field

    returns

    DataFrame with a new feature vector.

    Attributes
    protected[com.databricks.labs.automl.feature]
    Definition Classes
    FeatureInteractionBase
    Since

    0.6.2

  39. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  40. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  41. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  42. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  43. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from FeatureInteractionBase

Inherited from AnyRef

Inherited from Any

Ungrouped