FeatureEvaluator

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final val AGGREGATE_COLUMN: String

Definition Classes
FeatureInteractionBase
final val COUNT_COLUMN: String

Definition Classes
FeatureInteractionBase
final val ENTROPY_COLUMN: String

Definition Classes
FeatureInteractionBase
final val FIELD_ENTROPY_COLUMN: String

Definition Classes
FeatureInteractionBase
final val INDEXED_SUFFIX: String

Definition Classes
FeatureInteractionBase
final val QUANTILE_PRECISION: Double

Definition Classes
FeatureInteractionBase
final val QUANTILE_THRESHOLD: Double

Definition Classes
FeatureInteractionBase
final val RATIO_COLUMN: String

Definition Classes
FeatureInteractionBase
final val TOTAL_RATIO_COLUMN: String

Definition Classes
FeatureInteractionBase
final val VARIANCE_STATISTIC: String

Definition Classes
FeatureInteractionBase
final def asInstanceOf[T0]: T0

Definition Classes
Any
def calculateCategoricalInformationGain(df: DataFrame, labelColumn: String, fieldToTest: String, totalRecordCount: Long): Double

Helper method for calculating the Information Gain of a feature field
Helper method for calculating the Information Gain of a feature field
df
DataFrame that contains at least the fieldToTest and the Label Column
fieldToTest
The field to calculate Information Gain for
totalRecordCount
Total number of records in the data set
returns
The Information Gain of the field

Since
0.7.0
def calculateCategoricalVariance(df: DataFrame, labelColumn: String, fieldToTest: String): Double

Method for calculating the variance of a categorical (nominal) field based on a post-split first-layer variance of the label column's values to determine the minimum variance achievable in the label column.
Method for calculating the variance of a categorical (nominal) field based on a post-split first-layer variance of the label column's values to determine the minimum variance achievable in the label column.
df
DataFrame that contains the label column and the field under test for minimum by-group variance
labelColumn
The label column of the data set
fieldToTest
The feature column to test for variance reduction
returns
The minimum split variance of the aggregated label data by nominal group of the fieldToTest

Since
0.7.0
def calculateContinuousInformationGain(df: DataFrame, labelCol: String, fieldToTest: String, totalRecordCount: Long, bucketCount: Int): Double

Helper method for handling Information Gain Calculation for classification data set when dealing with continuous (numeric) feature elements.
Helper method for handling Information Gain Calculation for classification data set when dealing with continuous (numeric) feature elements. The continuous feature will be split upon the configured value of _continuousDiscretizerBucketCount, which is set by overriding .setContinuousDiscretizerBucketCount(<Int>)
df
DataFrame that contains the feature to test and the label column
fieldToTest
The feature field that is under test for entropy evaluation
totalRecordCount
Total number of elements in the data set.
returns
Information Gain associated with the feature field based on splits that could occur.

Since
0.7.0
def calculateContinuousVariance(df: DataFrame, labelColumn: String, fieldToTest: String, bucketCount: Int): Double

Method for calculating the variance of a continuous field for variance reduction in the label column based on bucketized grouping of the field under test.
Method for calculating the variance of a continuous field for variance reduction in the label column based on bucketized grouping of the field under test.
df
DataFrame that contains the label column and the field under test of continuous numeric type
labelColumn
The label column of the data set
fieldToTest
The field to test (continuous numeric) that need to be evaluated
bucketCount
The number of quantized buckets to create to group the field under test into in order to simulate where a decision split would occur.
returns
The minimum split variance of each of the buckets that have been created

Since
0.7.0
def calculatePercentageChange(before: Double, after: Double): Double

Method for evaluating the percentage change to the score metric to normalize.
Method for evaluating the percentage change to the score metric to normalize.
before
Score of a parent feature
after
Score of an interaction feature
returns
the percentage change

Attributes
protected[com.databricks.labs.automl.feature]
Definition Classes
FeatureInteractionBase
Since
0.6.2
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def discretizeContinuousFeature(df: DataFrame, fieldToTest: String, bucketCount: Int): DataFrame

Helper method for converting a continuous feature to a discrete bucketed value so that entropy can be calculated effectively for the feature.
Helper method for converting a continuous feature to a discrete bucketed value so that entropy can be calculated effectively for the feature.
df
DataFrame containing at least the field to test in continuous numeric format
fieldToTest
The name of the field under conversion
returns
A Dataframe with the continuous value converted to a quantized bucket membership value.

Since
0.7.0
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def extractAndValidateSchema(schema: StructType, featureVector: String): Unit

Helper method for extracting field names and ensuring that the feature vector is present
Helper method for extracting field names and ensuring that the feature vector is present
schema
Schema of the DataFrame undergoing feature interaction
featureVector
The name of the features column
returns
Array of column names of the DataFrame

Since
0.6.2
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def generateInteractionCandidates(featureColumns: Array[ColumnTypeData]): Array[InteractionPayload]

Method for generating a collection of Interaction Candidates to be tested and applied to the feature set if the tests for inclusion pass.
Method for generating a collection of Interaction Candidates to be tested and applied to the feature set if the tests for inclusion pass.
featureColumns
List of the columns that make up the feature vector
returns
Array of InteractionPayload values.

Attributes
protected[com.databricks.labs.automl.feature]
Definition Classes
FeatureInteractionBase
Since
0.6.2
def generateNominalIndexesInteractionFields(payload: FeatureInteractionCollection): NominalDataCollection

Method for converting nominal interaction fields to a new StringIndexed value to preserve information type and eliminate the possibility of data distribution skew
Method for converting nominal interaction fields to a new StringIndexed value to preserve information type and eliminate the possibility of data distribution skew
payload
FeatureInteractionCollection of the source parents and their interacted children fields
returns
NominalDataCollecction payload containing a DataFrame that has new StringIndexed fields for nominal interactions and the fields that need to be seen as included in the final feature vector

Attributes
protected[com.databricks.labs.automl.feature]
Definition Classes
FeatureInteractionBase
Since
0.6.2
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getFieldType(fieldType: String): structures.FieldEncodingType.Value

Attributes
protected[com.databricks.labs.automl.feature]
Definition Classes
FeatureInteractionBase
def getModelType(modelingType: String): structures.ModelingType.Value

Attributes
protected[com.databricks.labs.automl.feature]
Definition Classes
FeatureInteractionBase
def getRetentionMode(retentionMode: String): structures.InteractionRetentionMode.Value

Attributes
protected[com.databricks.labs.automl.feature]
Definition Classes
FeatureInteractionBase
def hashCode(): Int

Definition Classes
AnyRef → Any
def interactProduct(df: DataFrame, candidate: InteractionPayload): DataFrame

Method for generating a product interaction between feature columns
Method for generating a product interaction between feature columns
df
A DataFrame to add a field for an interaction between two columns
candidate
InteractionPayload information about the two parent columns and the name of the new interaction column to be created.
returns
A modified DataFrame with the new column.

Attributes
protected[com.databricks.labs.automl.feature]
Definition Classes
FeatureInteractionBase
Since
0.6.2
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def regenerateFeatureVector(df: DataFrame, preInteractedFields: Array[String], interactedFields: Array[String], featureCol: String): VectorAssemblyOutput

Helper method for recreating the feature vector after interactions have been completed on individual columns
Helper method for recreating the feature vector after interactions have been completed on individual columns
df
DataFrame containing the interacted fields with the original feature vector dropped
preInteractedFields
Fields making up the original vector before interaction
interactedFields
Interaction candidate fields that have been selected to be included in the final feature vector
featureCol
Name of the feature vector field
returns
DataFrame with a new feature vector.

Attributes
protected[com.databricks.labs.automl.feature]
Definition Classes
FeatureInteractionBase
Since
0.6.2
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package feature

object FeatureEvaluator extends FeatureInteractionBase

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final val AGGREGATE_COLUMN: String

final val COUNT_COLUMN: String

final val ENTROPY_COLUMN: String

final val FIELD_ENTROPY_COLUMN: String

final val INDEXED_SUFFIX: String

final val QUANTILE_PRECISION: Double

final val QUANTILE_THRESHOLD: Double

final val RATIO_COLUMN: String

final val TOTAL_RATIO_COLUMN: String

final val VARIANCE_STATISTIC: String

final def asInstanceOf[T0]: T0

def calculateCategoricalInformationGain(df: DataFrame, labelColumn: String, fieldToTest: String, totalRecordCount: Long): Double

def calculateCategoricalVariance(df: DataFrame, labelColumn: String, fieldToTest: String): Double

def calculateContinuousInformationGain(df: DataFrame, labelCol: String, fieldToTest: String, totalRecordCount: Long, bucketCount: Int): Double

def calculateContinuousVariance(df: DataFrame, labelColumn: String, fieldToTest: String, bucketCount: Int): Double

def calculatePercentageChange(before: Double, after: Double): Double

def clone(): AnyRef

def discretizeContinuousFeature(df: DataFrame, fieldToTest: String, bucketCount: Int): DataFrame

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def extractAndValidateSchema(schema: StructType, featureVector: String): Unit

def finalize(): Unit

def generateInteractionCandidates(featureColumns: Array[ColumnTypeData]): Array[InteractionPayload]

def generateNominalIndexesInteractionFields(payload: FeatureInteractionCollection): NominalDataCollection

final def getClass(): Class[_]

def getFieldType(fieldType: String): structures.FieldEncodingType.Value

def getModelType(modelingType: String): structures.ModelingType.Value

def getRetentionMode(retentionMode: String): structures.InteractionRetentionMode.Value

def hashCode(): Int

def interactProduct(df: DataFrame, candidate: InteractionPayload): DataFrame

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def regenerateFeatureVector(df: DataFrame, preInteractedFields: Array[String], interactedFields: Array[String], featureCol: String): VectorAssemblyOutput

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from FeatureInteractionBase

Inherited from AnyRef

Inherited from Any

Ungrouped