Class/Object

com.databricks.labs.automl.executor.config

ConfigurationGenerator

Related Docs: object ConfigurationGenerator | package config

Permalink

class ConfigurationGenerator extends ConfigurationDefaults

Main Configuration Generator utility class, used for generating a modeling configuration to execute the autoML framework.

Since

0.5

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ConfigurationGenerator
  2. ConfigurationDefaults
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ConfigurationGenerator(modelFamily: String, predictionType: String, genericConfig: GenericConfig)

    Permalink

    modelFamily

    The model family that is desired to be run (e.g. 'RandomForest') Allowable Options: "Trees", "GBT", "RandomForest", "LinearRegression", "LogisticRegression", "XGBoost", "MLPC", "SVM"

    predictionType

    The modeling type that is desired to be run (e.g. 'classifier') Allowable Options: "classifier" or "regressor"

    genericConfig

    Configuration object from GenericConfigGenerator

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final val allowableCardinalilties: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  5. final val allowableCategoricalFilterModes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  6. final val allowableCharacterFillStats: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  7. final val allowableClassificationScoringMetrics: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  8. final val allowableDateTimeConversionTypes: List[String]

    Permalink

    Static restrictions

    Static restrictions

    Definition Classes
    ConfigurationDefaults
  9. final val allowableDateTimeConversions: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  10. final val allowableEvolutionStrategies: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  11. final val allowableFeatureImportanceCutoffTypes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  12. final val allowableFeatureInteractionModes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  13. final val allowableGeneticMBORegressorTypes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  14. final val allowableHyperSpaceModelTypes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  15. final val allowableInitialGenerationIndexMixingModes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  16. final val allowableInitialGenerationModes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  17. final val allowableKMeansDistanceMeasurements: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  18. final val allowableLabelBalanceModes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  19. final val allowableMlFlowLoggingModes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  20. final val allowableMutationMagnitudeMode: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  21. final val allowableMutationModes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  22. final val allowableMutationStrategies: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  23. final val allowableNAFillModes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  24. final val allowableNumericFillStats: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  25. final val allowableOutlierFilterBounds: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  26. final val allowablePearsonFilterDirections: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  27. final val allowablePearsonFilterModes: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  28. final val allowablePearsonFilterStats: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  29. final val allowableRegressionScoringMetrics: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  30. final val allowableScalers: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  31. final val allowableScoringOptimizationStrategies: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  32. final val allowableTrainSplitMethods: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  33. final val allowableVectorMutationMethods: List[String]

    Permalink
    Definition Classes
    ConfigurationDefaults
  34. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  35. def autoStoppingOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting Auto Stopping Off

    Boolean switch for setting Auto Stopping Off

    Note

    Default: Off

  36. def autoStoppingOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting Auto Stopping On

    Boolean switch for setting Auto Stopping On

    Note

    Early stopping will invalidate the progress measurement system (due to non-determinism) Early termination will not occur immediately. Futures objects already committed will continue to run, but no new actions will be enqueued when a stopping criteria is met.

    ,

    Default: Off

  37. def cardinalitySwitchOff(): ConfigurationGenerator.this.type

    Permalink

    Setter switch for turning cardinality switch off.

    Setter switch for turning cardinality switch off.

    Since

    0.5.2

    Note

    Default: true

    ,

    Not recommended for exploratory data set features.

  38. def cardinalitySwitchOn(): ConfigurationGenerator.this.type

    Permalink

    Setter switch for turning cardinality switch on This switch is intended to set whether the a cardinality check is performed on StringIndexed columns

    Setter switch for turning cardinality switch on This switch is intended to set whether the a cardinality check is performed on StringIndexed columns

    Since

    0.5.2

    Note

    Default: true

  39. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. def covarianceFilterOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning Covariance filtering off

    Boolean switch for turning Covariance filtering off

    Note

    Default: Off

  41. def covarianceFilterOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning Covariance filtering on

    Boolean switch for turning Covariance filtering on

    Note

    Default: Off

  42. def dataPrepCachingOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the Data Prep Caching Off

    Boolean switch for setting the Data Prep Caching Off

    Note

    Depending on the size and partitioning of the data set, caching may or may not improve performance.

    ,

    Default: On

  43. def dataPrepCachingOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the Data Prep Caching On

    Boolean switch for setting the Data Prep Caching On

    Note

    Depending on the size and partitioning of the data set, caching may or may not improve performance.

    ,

    Default: On

  44. def deltaCheckBackingDirectoryRemovalOff(): ConfigurationGenerator.this.type

    Permalink
  45. def deltaCheckBackingDirectoryRemovalOn(): ConfigurationGenerator.this.type

    Permalink
  46. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  47. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  48. def featureInteractionOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning featureInteraction off

    Boolean switch for turning featureInteraction off

    Since

    0.6.2

  49. def featureInteractionOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting featureInteraction on.

    Boolean switch for setting featureInteraction on. This setting will, in conjunction with the settings for featureInteraction elements in the config, perform pair-wise product interactions of all elements of the feature vector, retaining either all or some of those interactions for inclusion to the feature vector. For classification tasks, InformationGain is used as the metric to compare inclusion (for modes other than 'all') For regression tasks, Variance is used as the metric.

    Since

    0.6.2

  50. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  51. def generateFeatureImportanceConfig: MainConfig

    Permalink
  52. def generateMainConfig: MainConfig

    Permalink
  53. def generateTreeSplitConfig: MainConfig

    Permalink
  54. var genericConfig: GenericConfig

    Permalink

    Configuration object from GenericConfigGenerator

  55. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  56. def getDefaultConfig(modelFamily: String, predictionType: String): InstanceConfig

    Permalink
    Definition Classes
    ConfigurationDefaults
  57. def getInstanceConfig: InstanceConfig

    Permalink

    Getters

  58. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  59. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  60. def naFillOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning off naFill actions

    Boolean switch for turning off naFill actions

    Note

    HIGHLY RECOMMENDED TO NOT TURN OFF

    ,

    Default: On

  61. def naFillOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning on naFill actions

    Boolean switch for turning on naFill actions

    Note

    HIGHLY RECOMMENDED TO LEAVE ON.

    ,

    Default: On

  62. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  63. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  64. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  65. def oneHotEncodeFlag(family: FamilyValidator): Boolean

    Permalink
    Definition Classes
    ConfigurationDefaults
  66. def oneHotEncodeOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning off One Hot Encoding

    Boolean switch for turning off One Hot Encoding

    Note

    Default: Off for Tree based algorithms, On for all others.

  67. def oneHotEncodeOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning One Hot Encoding of string and character features on

    Boolean switch for turning One Hot Encoding of string and character features on

    Note

    Turning One Hot Encoding on for a tree-based algorithm (XGBoost, RandomForest, Trees, GBT) is not recommended. Introducing synthetic dummy variables in a tree algorithm will force the creation of sparse tree splits.

    ,

    Default: Off for Tree based algorithms, On for all others.

    See also

    See https://towardsdatascience.com/one-hot-encoding-is-making-your-tree-based-ensembles-worse-heres-why-d64b282b5769 for a full explanation.

  68. def outlierFilterOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning outlier filtering off

    Boolean switch for turning outlier filtering off

    Note

    Default: Off

  69. def outlierFilterOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning outlier filtering on

    Boolean switch for turning outlier filtering on

    Note

    Default: Off

  70. def pearsonFilterOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning Pearson filtering off

    Boolean switch for turning Pearson filtering off

    Note

    Default: Off

  71. def pearsonFilterOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning Pearson filtering on

    Boolean switch for turning Pearson filtering on

    Note

    Default: Off

  72. def pipelineDebugFlagOff(value: Boolean): ConfigurationGenerator.this.type

    Permalink
  73. def pipelineDebugFlagOn(value: Boolean): ConfigurationGenerator.this.type

    Permalink
  74. def scalingFlag(family: FamilyValidator): Boolean

    Permalink
    Definition Classes
    ConfigurationDefaults
  75. def scalingOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning scaling Off

    Boolean switch for turning scaling Off

    Note

    Default: Off for Tree based algorithms, On for all others.

  76. def scalingOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning scaling On

    Boolean switch for turning scaling On

    Note

    For Tree based algorithms (RandomForest, XGBoost, GBT, Trees), it is not necessary (and can adversely affect the model performance) that this be turned on.

    ,

    Default: Off for Tree based algorithms, On for all others.

  77. def setAutoStoppingFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the state of autoStoppingFlag

    Boolean switch for setting the state of autoStoppingFlag

    value

    Boolean

  78. def setConfig(value: InstanceConfig): ConfigurationGenerator.this.type

    Permalink

    Helper method for copying a pre-defined InstanceConfig to a new instance.

    Helper method for copying a pre-defined InstanceConfig to a new instance.

    value

    InstanceConfig object

  79. def setCovarianceCutoffHigh(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter
    Covariance Cutoff for specifying the feature-to-feature correlation statistic upper cutoff boundary

    Setter
    Covariance Cutoff for specifying the feature-to-feature correlation statistic upper cutoff boundary

    value

    Double: Threshold Cutoff Value

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Example:
    1. For feature columns A, B, and C, if A<->B is 0.02, A<->C is 0.1, B<->C is 0.85, with a value set of 0.8,
      Column C would be removed from the feature vector for having a high value of the correlation statistic.

    Exceptions thrown

    IllegalArgumentException if the value is <= -1.0

    Note

    WARNING This setting is not recommended to be used in a production use case and is only potentially useful for data exploration and experimentation.

    ,

    Default: 0.99

  80. def setCovarianceCutoffLow(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter
    Covariance Cutoff for specifying the feature-to-feature correlation statistic lower cutoff boundary

    Setter
    Covariance Cutoff for specifying the feature-to-feature correlation statistic lower cutoff boundary

    value

    Double: Threshold Cutoff Value

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Example:
    1. For feature columns A, B, and C, if A->B is 0.02, A->C is 0.1, B->C is 0.85, with a value set of 0.05,
      Column A would be removed from the feature vector for having a low value of the correlation statistic.

    Exceptions thrown

    IllegalArgumentException if the value is <= -1.0

    Note

    WARNING the lower threshold boundary for correlation is less frequently used. Filtering of auto-correlated features is done primarily through .setCovarianceCutoffHigh values lower than the default of 0.99

    ,

    WARNING This setting is not recommended to be used in a production use case and is only potentially useful for data exploration and experimentation.

    ,

    Default: -0.99

  81. def setCovarianceFilterFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the state of covarianceFilterFlag

    Boolean switch for setting the state of covarianceFilterFlag

    value

    Boolean

  82. def setDataPrepCachingFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the state of DataPrepCachingFlag

    Boolean switch for setting the state of DataPrepCachingFlag

    value

    Boolean

  83. def setDataPrepParallelism(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for defining the number of concurrent threads allocated to performing asynchronous data prep tasks within the feature engineering aspect of this application.

    Setter for defining the number of concurrent threads allocated to performing asynchronous data prep tasks within the feature engineering aspect of this application.

    value

    Int: A value that must be greater than zero.

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.0

    Exceptions thrown

    IllegalArgumentException if a value less than or equal to zero is supplied.

    Note

    This value has an upper limit, depending on driver size, that will restrict the efficacy of the asynchronous tasks within the pool. Setting this too high may cause cluster instability.

  84. def setDataReductionFactor(value: Double): ConfigurationGenerator.this.type

    Permalink
  85. def setFeatureImportanceCutoffType(value: String): ConfigurationGenerator.this.type

    Permalink
  86. def setFeatureImportanceCutoffValue(value: Double): ConfigurationGenerator.this.type

    Permalink
  87. def setFeatureInteractionContinuousDiscretizerBucketCount(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for determining the behavior of continuous feature columns.

    Setter for determining the behavior of continuous feature columns. In order to calculate Entropy for a continuous variable, the distribution must be converted to nominal values for estimation of per-split information gain. This setting defines how many nominal categorical values to create out of a continuously distributed feature in order to calculate Entropy.

    value

    Int -> must be greater than 1

    Since

    0.6.2

    Exceptions thrown

    IllegalArgumentException if the value specified is <= 1

  88. def setFeatureInteractionFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Setter for defining the state of the featureInteractionFlag

    Setter for defining the state of the featureInteractionFlag

    value

    Boolean on/off

    Since

    0.6.2

  89. def setFeatureInteractionParallelism(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for configuring the concurrent count for scoring of feature interaction candidates.

    Setter for configuring the concurrent count for scoring of feature interaction candidates. Due to the nature of these operations, the configuration here may need to be set differently to that of the modeling and general feature engineering phases of the toolkit. This is highly dependent on the row count of the data set being submitted.

    value

    Int -> must be greater than 0

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.2

    Exceptions thrown

    IllegalArgumentException if the value is < 1

  90. def setFeatureInteractionRetentionMode(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for determining the mode of operation for inclusion of interacted features.

    Setter for determining the mode of operation for inclusion of interacted features. Modes are:

    • all -> Includes all interactions between all features (after string indexing of categorical values)
    • optimistic -> If the Information Gain / Variance, as compared to at least ONE of the parents of the interaction is above the threshold set by featureInteractionTargetInteractionPercentage (e.g. if IG of left parent is 0.5 and right parent is 0.9, with threshold set at 10, if the interaction between these two parents has an IG of 0.42, it would be rejected, but if it was 0.46, it would be kept)
    • strict -> the threshold percentage must be met for BOTH parents. (in the above example, the IG for the interaction would have to be > 0.81 in order to be included in the feature vector).
    value

    String -> one of: 'all', 'optimistic', or 'strict'

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.2

    Exceptions thrown

    IllegalArgumentException if the specified value submitted is not permitted

  91. def setFeatureInteractionTargetInteractionPercentage(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter for establishing the minimum acceptable InformationGain or Variance allowed for an interaction candidate based on comparison to the scores of its parents.

    Setter for establishing the minimum acceptable InformationGain or Variance allowed for an interaction candidate based on comparison to the scores of its parents.

    value

    Double in range of -inf -> inf

    Since

    0.6.2

  92. def setFillConfigCardinalityCheckMode(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for the cardinality check mode to be used.

    Setter for the cardinality check mode to be used. Available modes are "warn" and "silent". In "warn" mode, an exception will be thrown if the cardinality for a categorical column is above the threshold. In "silent" mode, the field will be ignored from processing and will not be included in the feature vector.

    value

    String: either "warn" or "silent"

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.5.2

    Exceptions thrown

    IllegalArgumentException if the mode supplied is not either "warn" or "silent"

    Note

    Default: "silent"

  93. def setFillConfigCardinalityLimit(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for overriding the default cardinality limit when validating whether a field should be considered for OneHotEncoding or StringIndexing

    Setter for overriding the default cardinality limit when validating whether a field should be considered for OneHotEncoding or StringIndexing

    value

    Int: The value at above which a field will be declared to be of too high a cardinality for StringIndexing or OneHotEncoding

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.5.2

    Exceptions thrown

    java.lang.IllegalArgumentException if the number is <= to 0

    Note

    Default: 200

  94. def setFillConfigCardinalityPrecision(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter for defining the precision calculation when in "approx" mode for cardinalityType.

    Setter for defining the precision calculation when in "approx" mode for cardinalityType. Must be in range 0 -> 1

    value

    Double: The precision for approximate distinct calculations for cardinality purposes

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.5.2

    Exceptions thrown

    java.lang.IllegalArgumentException if the Double supplied is outside of the range of 0 -> 1

  95. def setFillConfigCardinalitySwitch(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Setter for direct override of the cardinality switch

    Setter for direct override of the cardinality switch

    Since

    0.5.2

    Note

    Default: true

  96. def setFillConfigCardinalityType(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for specifying the mode of cardinality checking [either "approx" for approximate distinct or "exact"]

    Setter for specifying the mode of cardinality checking [either "approx" for approximate distinct or "exact"]

    value

    String: either "approx" or "exact"

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.5.2

    Exceptions thrown

    IllegalArgumentException if a mode other than exact or approx is specified.

    Note

    Default - exact

  97. def setFillConfigCategoricalNAFillMap(value: Map[String, String]): ConfigurationGenerator.this.type

    Permalink

    Setter for providing a map of [Column Name -> String Fill Value] for manual by-column overrides.

    Setter for providing a map of [Column Name -> String Fill Value] for manual by-column overrides. Any non-specified fields in this map will utilize the "auto" statistics-based fill paradigm to calculate and fill any NA values in non-numeric columns.

    value

    Map[String, String]: Column Name as String -> Fill Value as String

    Since

    0.5.2

    Note

    If fields are specified in here that are not part of the DataFrame's schema, an exception will be thrown.

    ,

    if naFillMode is specified as using Map Fill modes, this setter or the numeric na fill map MUST be set.

  98. def setFillConfigCharacterFillStat(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter Specifies the behavior of the naFill algorithm for character (String, Char, Boolean, Byte, etc.) fields.

    Setter Specifies the behavior of the naFill algorithm for character (String, Char, Boolean, Byte, etc.) fields. Generated through a df.summary() method
    Available options are:
    "min" (least frequently occurring value)
    or
    "max" (most frequently occurring value)

    value

    String: member of allowable list

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException if an invalid entry is made.

    Note

    Default: "max"

  99. def setFillConfigCharacterNABlanketFillValue(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for providing a 'blanket override' value (fill all found categorical columns' missing values with this specified value).

    Setter for providing a 'blanket override' value (fill all found categorical columns' missing values with this specified value).

    value

    String: A value to fill all categorical na values in the DataFrame with.

    Since

    0.5.2

  100. def setFillConfigFilterPrecision(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter for defining the precision for calculating the model type as per the label column

    Setter for defining the precision for calculating the model type as per the label column

    value

    Double: Precision accuracy for approximate distinct calculation.

    Annotations
    @throws( classOf[AssertionError] )
    Since

    0.5.2

    Exceptions thrown

    java.lang.AssertionError If the value is outside of the allowable range of {0, 1}

    Note

    setting this value to zero (0) for a large regression problem will incur a long processing time and an expensive shuffle.

  101. def setFillConfigNAFillMode(value: String): ConfigurationGenerator.this.type

    Permalink

    Mode for na fill
    Available modes:
    auto : Stats-based na fill for fields.

    Mode for na fill
    Available modes:
    auto : Stats-based na fill for fields. Usage of .setNumericFillStat and .setCharacterFillStat will inform the type of statistics that will be used to fill.
    mapFill : Custom by-column overrides to 'blanket fill' na values on a per-column basis. The categorical (string) fields are set via .setCategoricalNAFillMap while the numeric fields are set via .setNumericNAFillMap.
    blanketFillAll : Fills all fields based on the values specified by .setCharacterNABlanketFillValue and .setNumericNABlanketFillValue. All NA's for the appropriate types will be filled in accordingly throughout all columns.
    blanketFillCharOnly Will use statistics to fill in numeric fields, but will replace all categorical character fields na values with a blanket fill value.
    blanketFillNumOnly Will use statistics to fill in character fields, but will replace all numeric fields na values with a blanket value.

    value

    String: Mode for NA Fill

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.5.2

    Exceptions thrown

    IllegalArgumentException if the mods specified is not supported.

  102. def setFillConfigNumericFillStat(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter Specifies the behavior of the naFill algorithm for numeric (continuous) fields.
    Values that are generated as potential fill candidates are set according to the available statistics that are calculated from a df.summary() method.
    Available options are:
    "min", "25p", "mean", "median", "75p", or "max"

    Setter Specifies the behavior of the naFill algorithm for numeric (continuous) fields.
    Values that are generated as potential fill candidates are set according to the available statistics that are calculated from a df.summary() method.
    Available options are:
    "min", "25p", "mean", "median", "75p", or "max"

    value

    String: member of allowable list.

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException if an invalid entry is made.

    Note

    Default: "mean"

  103. def setFillConfigNumericNABlanketFillValue(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter for providing a 'blanket override' value (fill all found numeric columns' missing values with this specified value)

    Setter for providing a 'blanket override' value (fill all found numeric columns' missing values with this specified value)

    value

    Double: A value to fill all numeric na value in the DataFrame with.

    Since

    0.5.2

  104. def setFillConfigNumericNAFillMap(value: Map[String, AnyVal]): ConfigurationGenerator.this.type

    Permalink

    Setter for providing a map of [Column Name -> AnyVal Fill Value] (must be numeric).

    Setter for providing a map of [Column Name -> AnyVal Fill Value] (must be numeric). Any non-specified fields in this map will utilize the "auto" statistics-based fill paradigm to calculate and fill any NA values in numeric columns.

    value

    Map[String, AnyVal]: Column Name as String -> Fill Numeric Type Value

    Since

    0.5.2

    Note

    If fields are specified in here that are not part of the DataFrame's schema, an exception will be thrown.

    ,

    if naFillMode is specified as using Map Fill modes, this setter or the categorical na fill map MUST be set.

  105. def setInferenceConfigSaveLocation(value: String): ConfigurationGenerator.this.type

    Permalink
    Annotations
    @throws( classOf[IllegalArgumentException] )
  106. def setMlFlowAPIToken(value: String): ConfigurationGenerator.this.type

    Permalink
  107. def setMlFlowBestSuffix(value: String): ConfigurationGenerator.this.type

    Permalink
  108. def setMlFlowCustomRunTags(value: Map[String, AnyVal]): ConfigurationGenerator.this.type

    Permalink

    Setter
    Allows for setting a series of custom mlflow logging tags to an experiment run (universal across all iterations and models of the run) to be logged in mlflow as a custom tag key value pair

    Setter
    Allows for setting a series of custom mlflow logging tags to an experiment run (universal across all iterations and models of the run) to be logged in mlflow as a custom tag key value pair

    value

    Array of Map[String -> AnyVal]

    Note

    The mapped values can be of types: Double, Float, Long, Int, Short, Byte, Boolean, or String

  109. def setMlFlowExperimentName(value: String): ConfigurationGenerator.this.type

    Permalink
  110. def setMlFlowLogArtifactsFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink
  111. def setMlFlowLogArtifactsOff(): ConfigurationGenerator.this.type

    Permalink
  112. def setMlFlowLogArtifactsOn(): ConfigurationGenerator.this.type

    Permalink
  113. def setMlFlowLoggingFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink
  114. def setMlFlowLoggingMode(value: String): ConfigurationGenerator.this.type

    Permalink
  115. def setMlFlowLoggingOff(): ConfigurationGenerator.this.type

    Permalink
  116. def setMlFlowLoggingOn(): ConfigurationGenerator.this.type

    Permalink

    MLFlow Logging Config

  117. def setMlFlowModelSaveDirectory(value: String): ConfigurationGenerator.this.type

    Permalink
    Annotations
    @throws( classOf[IllegalArgumentException] )
  118. def setMlFlowTrackingURI(value: String): ConfigurationGenerator.this.type

    Permalink
  119. def setNaFillFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the state of naFillFlag

    Boolean switch for setting the state of naFillFlag

    value

    Boolean (whether to execute filling of na values on the DataFrame's non-ignored fields)

  120. def setNumericBoundaries(value: Map[String, (Double, Double)]): ConfigurationGenerator.this.type

    Permalink
  121. def setOneHotEncodeFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the state of oneHotEncodeFlag

    Boolean switch for setting the state of oneHotEncodeFlag

    value

    Boolean

  122. def setOutlierContinuousDataThreshold(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter
    Defines the determination of whether to classify a numeric field as ordinal (categorical) or continuous.

    Setter
    Defines the determination of whether to classify a numeric field as ordinal (categorical) or continuous.

    value

    Int: Threshold for distinct counts within a numeric feature field.

    Note

    Continuous data fields are eligible for outlier filtering. Categorical fields are not, and if below cardinality thresholds set by this value setter, those fields will be ignored by the filtering action.

  123. def setOutlierFieldsToIgnore(value: Array[String]): ConfigurationGenerator.this.type

    Permalink

    Setter
    Defines an Array of fields to be ignored from outlier filtering.

    Setter
    Defines an Array of fields to be ignored from outlier filtering.

    value

    Array[String]: field names to be ignored from outlier filtering.

  124. def setOutlierFilterBounds(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter

    Setter

    Configures the tails of a distribution to filter out, along with the ntile settings defined in: .setOutlierLowerFilterNTile() and/or .setOutlierUpperFilterNTile()

    Available Modes:
    "lower" -> filters out rows from the data that are below the value set in .setOutlierLowerFilterNTile()
    "upper" -> filter out rows from the data that are above the the value set in .setOutlierUpperFilterNTile()
    "both" -> two-tailed filter that combines both an "upper" and "lower" filter.

    value

    String: Tailed direction setting for outlier filtering.

    Note

    This filter action is disabled by default. Before enabling, please ensure the fields to be filtered are adequately reflected in the .setOutlierFieldsToIgnore() inverse selection, as well as verifying the general distribution of the fields that have outlier data in order to select an appropriate NTile value. <u>This feature should only be supplied in rare instances and a full understanding of the impacts that this filter may have should be understood before enabling it.</u>

    ,

    Default: "both"

  125. def setOutlierFilterFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the state of outlierFilterFlag

    Boolean switch for setting the state of outlierFilterFlag

    value

    Boolean

  126. def setOutlierFilterPrecision(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter
    Defines the precision (RSD) in which each field's cardinality is calculated through the use of approx_count_distinct SparkSQL function.

    Setter
    Defines the precision (RSD) in which each field's cardinality is calculated through the use of approx_count_distinct SparkSQL function. Lower values specify higher accuracy, but consume more computational resources.

    value

    Double: In range of 0.0, 1.0

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException if the value supplied is outside of the Range(0.0, 1.0)

    Note

    A Value of 0.0 will be an exact computation of distinct values. Therefore, all data must be shuffled, which is an expensive task.

    See also

    https://en.wikipedia.org/wiki/Coefficient_of_variation for explanation of RSD

  127. def setOutlierLowerFilterNTile(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter
    Defines the NTILE value of the distributions of feature fields below which rows that fall beneath this value will be filtered from the data.

    Setter
    Defines the NTILE value of the distributions of feature fields below which rows that fall beneath this value will be filtered from the data.

    value

    Double: Lower Threshold boundary NTILE for Outlier Filtering

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException if the value supplied is outside of the Range(0.0,1.0)

    Note

    Only used if Outlier filtering is set to 'On' and Filter Direction is either 'both' or 'lower'

  128. def setOutlierUpperFilterNTile(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter
    Defines the NTILE value of the distributions of feature fields above which rows that fall above this value will be filtered from the data

    Setter
    Defines the NTILE value of the distributions of feature fields above which rows that fall above this value will be filtered from the data

    value

    Double: Upper Threshold boundary NTILE value for Outlier Filtering

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException if the value supplied is outside of the Range(0.0,1.0)

    Note

    Only used if Outlier filtering is set to 'On' and Filter Direction is either 'both' or 'upper'

  129. def setPearsonAutoFilterNTile(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter
    Provides the ntile threshold above or below which (depending on PearsonFilterDirection setting) fields will
    be removed, depending on the distribution of pearson statistics from all feature columns.

    Setter
    Provides the ntile threshold above or below which (depending on PearsonFilterDirection setting) fields will
    be removed, depending on the distribution of pearson statistics from all feature columns.

    value

    Double: In range of (0.0, 1.0)

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException if the value provided is outside of the range of (0.0, 1.0)

    Note

    Default: 0.75 (Q3)

    ,

    WARNING - this feature is ONLY recommended to be used for exploratory development work.

  130. def setPearsonFilterDirection(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter
    Controls which direction of correlation values to filter out.

    Setter
    Controls which direction of correlation values to filter out. Allowable modes:
    "greater" or "lesser"

    value

    String: one of available modes

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException if the value provided is not in available modes list.

    Note

    Default: greater

  131. def setPearsonFilterFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the state of pearsonFilterFlag

    Boolean switch for setting the state of pearsonFilterFlag

    value

    Boolean

  132. def setPearsonFilterManualValue(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter
    Controls the Pearson manual filter value, if the PearsonFilterMode is set to "manual"

    Setter
    Controls the Pearson manual filter value, if the PearsonFilterMode is set to "manual"

    value

    Double: A value that is used as a cut-off point to filter fields whose correlation statistic is either above or below will be culled from the feature vector.

    Example:
    1. with .setPearsonFilterMode("manual") and .setPearsonFilterDirection("greater")
      the removal of fields that have a pearson correlation coefficient result above this
      value will be dropped from modeling runs.

  133. def setPearsonFilterMode(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter
    Controls whether to use "auto" mode (using the PearsonAutoFilterNTile) or "manual" mode (using the
    PearsonFilterManualValue) to cull fields from the feature vector.

    Setter
    Controls whether to use "auto" mode (using the PearsonAutoFilterNTile) or "manual" mode (using the
    PearsonFilterManualValue) to cull fields from the feature vector.

    value

    String: either "auto" or "manual"

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException if the value provided is not in available modes list (auto and manual)

    Note

    Default: "auto"

  134. def setPearsonFilterStatistic(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter
    Selection for filter statistic to be used in Pearson Filtering.
    Available modes: "pvalue", "degreesFreedom", or "pearsonStat"

    Setter
    Selection for filter statistic to be used in Pearson Filtering.
    Available modes: "pvalue", "degreesFreedom", or "pearsonStat"

    value

    String: one of available modes.

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException if the value provided is not in available modes list.

    Note

    Default: pearsonStat

  135. def setPipelineDebugFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink
  136. def setScalingFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the state of the scalingFlag

    Boolean switch for setting the state of the scalingFlag

    value

    Boolean

  137. def setScalingMax(value: Double): ConfigurationGenerator.this.type

    Permalink
  138. def setScalingMin(value: Double): ConfigurationGenerator.this.type

    Permalink
  139. def setScalingPNorm(value: Double): ConfigurationGenerator.this.type

    Permalink
  140. def setScalingStandardMeanFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink
  141. def setScalingStandardMeanFlagOff(): ConfigurationGenerator.this.type

    Permalink
  142. def setScalingStandardMeanFlagOn(): ConfigurationGenerator.this.type

    Permalink
  143. def setScalingStdDevFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink
  144. def setScalingStdDevFlagOff(): ConfigurationGenerator.this.type

    Permalink
  145. def setScalingStdDevFlagOn(): ConfigurationGenerator.this.type

    Permalink
  146. def setScalingType(value: String): ConfigurationGenerator.this.type

    Permalink
  147. def setSplitCachingStrategy(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for determining the split caching strategy (either persist to disk for each kfold split or backing to Delta)

    Setter for determining the split caching strategy (either persist to disk for each kfold split or backing to Delta)

    value

    Configuration string either 'persist' or 'delta'

    Since

    0.7.1

  148. def setStringBoundaries(value: Map[String, List[String]]): ConfigurationGenerator.this.type

    Permalink

    Algorithm Config

  149. def setTunerAutoStoppingScore(value: Double): ConfigurationGenerator.this.type

    Permalink

    Tuner Config

  150. def setTunerContinuousEvolutionGeneticMixing(value: Double): ConfigurationGenerator.this.type

    Permalink
  151. def setTunerContinuousEvolutionImprovementThreshold(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for defining the secondary stopping criteria for continuous training mode ( number of consistently not-improving runs to terminate the learning algorithm due to diminishing returns.

    Setter for defining the secondary stopping criteria for continuous training mode ( number of consistently not-improving runs to terminate the learning algorithm due to diminishing returns.

    value

    Negative Integer (an improvement to a priori will reset the counter and subsequent non-improvements will decrement a mutable counter. If the counter hits this limit specified in value, the continuous mode algorithm will stop).

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.0

    Exceptions thrown

    IllegalArgumentException if the value is positive.

  152. def setTunerContinuousEvolutionMaxIterations(value: Int): ConfigurationGenerator.this.type

    Permalink
  153. def setTunerContinuousEvolutionMutationAggressiveness(value: Int): ConfigurationGenerator.this.type

    Permalink
  154. def setTunerContinuousEvolutionParallelism(value: Int): ConfigurationGenerator.this.type

    Permalink
  155. def setTunerContinuousEvolutionRollingImprovementCount(value: Int): ConfigurationGenerator.this.type

    Permalink
  156. def setTunerContinuousEvolutionStoppingScore(value: Double): ConfigurationGenerator.this.type

    Permalink
  157. def setTunerDeltaCacheBackingDirectory(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for providing a path to write the kfold train/test splits as Delta data sets to (useful for extremely large data sets or a situation where using local disk storage might be prohibitively expensive)

    Setter for providing a path to write the kfold train/test splits as Delta data sets to (useful for extremely large data sets or a situation where using local disk storage might be prohibitively expensive)

    value

    String path to a dbfs location for creating the temporary (or persisted)

    Since

    0.7.1

  158. def setTunerDeltaCacheBackingDirectoryRemovalFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Setter for whether or not to delete the written train/test splits for the run in Delta.

    Setter for whether or not to delete the written train/test splits for the run in Delta. Defaulted to true which means that the job will delete the data on Object store to clean itself up after the run is completed if the splitCachingStrategy is set to 'delta'

    value

    Boolean - true => delete false => leave on Object Store

    Since

    0.7.1

  159. def setTunerEvolutionStrategy(value: String): ConfigurationGenerator.this.type

    Permalink
  160. def setTunerFirstGenerationGenePool(value: Int): ConfigurationGenerator.this.type

    Permalink
  161. def setTunerFixedMutationValue(value: Int): ConfigurationGenerator.this.type

    Permalink
  162. def setTunerGenerationalMutationStrategy(value: String): ConfigurationGenerator.this.type

    Permalink
  163. def setTunerGeneticMBOCandidateFactor(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for defining the factor to be applied to the candidate listing of hyperparameters to generate through mutation for each generation other than the initial and post-modeling optimization phases.

    Setter for defining the factor to be applied to the candidate listing of hyperparameters to generate through mutation for each generation other than the initial and post-modeling optimization phases. The larger this value (default: 10), the more potential space can be searched. There is not a large performance hit to this, and as such, values in excess of 100 are viable.

    value

    Int - a factor to multiply the numberOfMutationsPerGeneration by to generate a count of potential candidates.

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.0

    Exceptions thrown

    IllegalArgumentException if the value is not greater than zero.

  164. def setTunerGeneticMBORegressorType(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for selecting the type of Regressor to use for the within-epoch generation MBO of candidates

    Setter for selecting the type of Regressor to use for the within-epoch generation MBO of candidates

    value

    String - one of "XGBoost", "LinearRegression" or "RandomForest"

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.0

    Exceptions thrown

    IllegalArgumentException if the value is not supported

  165. def setTunerGeneticMixing(value: Double): ConfigurationGenerator.this.type

    Permalink
  166. def setTunerHyperSpaceInferenceCount(value: Int): ConfigurationGenerator.this.type

    Permalink
  167. def setTunerHyperSpaceInferenceFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink
  168. def setTunerHyperSpaceInferenceOff(): ConfigurationGenerator.this.type

    Permalink
  169. def setTunerHyperSpaceInferenceOn(): ConfigurationGenerator.this.type

    Permalink
  170. def setTunerHyperSpaceModelCount(value: Int): ConfigurationGenerator.this.type

    Permalink
  171. def setTunerHyperSpaceModelType(value: String): ConfigurationGenerator.this.type

    Permalink
  172. def setTunerInitialGenerationArraySeed(value: Long): ConfigurationGenerator.this.type

    Permalink
  173. def setTunerInitialGenerationIndexMixingMode(value: String): ConfigurationGenerator.this.type

    Permalink
  174. def setTunerInitialGenerationMode(value: String): ConfigurationGenerator.this.type

    Permalink
  175. def setTunerInitialGenerationPermutationCount(value: Int): ConfigurationGenerator.this.type

    Permalink
  176. def setTunerKFold(value: Int): ConfigurationGenerator.this.type

    Permalink
  177. def setTunerKSampleCardinalityThreshold(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter - for overriding the cardinality threshold exception threshold.

    Setter - for overriding the cardinality threshold exception threshold. [WARNING] increasing this value on a sufficiently large data set could incur, during runtime, excessive memory and cpu pressure on the cluster.

    value

    Int: the limit above which an exception will be thrown for a classification problem wherein the label distinct count is too large to successfully generate synthetic data.

    Since

    0.5.1

    Note

    Default: 20

  178. def setTunerKSampleKGroups(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for specifying the number of K-Groups to generate in the KMeans model

    Setter for specifying the number of K-Groups to generate in the KMeans model

    value

    Int: number of k groups to generate

    returns

    this

  179. def setTunerKSampleKMeansDistanceMeasurement(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for which distance measurement to use to calculate the nearness of vectors to a centroid

    Setter for which distance measurement to use to calculate the nearness of vectors to a centroid

    value

    String: Options -> "euclidean" or "cosine" Default: "euclidean"

    returns

    this

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if an invalid value is entered

  180. def setTunerKSampleKMeansMaxIter(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for specifying the maximum number of iterations for the KMeans model to go through to converge

    Setter for specifying the maximum number of iterations for the KMeans model to go through to converge

    value

    Int: Maximum limit on iterations

    returns

    this

  181. def setTunerKSampleKMeansPredictionCol(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for the internal KMeans column for cluster membership attribution

    Setter for the internal KMeans column for cluster membership attribution

    value

    String: column name for internal algorithm column for group membership

    returns

    this

  182. def setTunerKSampleKMeansSeed(value: Long): ConfigurationGenerator.this.type

    Permalink

    Setter for a KMeans seed for the clustering algorithm

    Setter for a KMeans seed for the clustering algorithm

    value

    Long: Seed value

    returns

    this

  183. def setTunerKSampleKMeansTolerance(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter for Setting the tolerance for KMeans (must be >0)

    Setter for Setting the tolerance for KMeans (must be >0)

    value

    The tolerance value setting for KMeans

    returns

    this

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if a value less than 0 is entered

    See also

    reference: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.clustering.KMeans for further details.

  184. def setTunerKSampleLSHHashTables(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for Configuring the number of Hash Tables to use for MinHashLSH

    Setter for Configuring the number of Hash Tables to use for MinHashLSH

    value

    Int: Count of hash tables to use

    returns

    this

    See also

    http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.MinHashLSH for more information

  185. def setTunerKSampleLSHOutputCol(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for the internal LSH output hash information column

    Setter for the internal LSH output hash information column

    value

    String: column name for the internal MinHashLSH Model transformation value

    returns

    this

  186. def setTunerKSampleLSHSeed(value: Long): ConfigurationGenerator.this.type

    Permalink
  187. def setTunerKSampleLabelBalanceMode(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter - for determining the label balance approach mode.

    Setter - for determining the label balance approach mode.

    value

    String: one of: 'match', 'percentage' or 'target'

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.5.1

    Exceptions thrown

    IllegalArgumentException if the provided mode is not supported.

    Note

    Default: "percentage"

    ,

    Available modes:
    'match': Will match all smaller class counts to largest class count. [WARNING] - May significantly increase memory pressure!
    'percentage' Will adjust smaller classes to a percentage value of the largest class count. 'target' Will increase smaller class counts to a fixed numeric target of rows.

  188. def setTunerKSampleMinimumVectorCountToMutate(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for minimum threshold for vector indexes to mutate within the feature vector.

    Setter for minimum threshold for vector indexes to mutate within the feature vector.

    value

    The minimum (or fixed) number of indexes to mutate.

    returns

    this

    Note

    In vectorMutationMethod "fixed" this sets the fixed count of how many vector positions to mutate. In vectorMutationMethod "random" this sets the lower threshold for 'at least this many indexes will be mutated'

  189. def setTunerKSampleMutationMode(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for the Mutation Mode of the feature vector individual values

    Setter for the Mutation Mode of the feature vector individual values

    value

    String: the mode to use.

    returns

    this

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if the mode is not supported.

    Note

    Options: "weighted" - uses weighted averaging to scale the euclidean distance between the centroid vector and mutation candidate vectors "random" - randomly selects a position on the euclidean vector between the centroid vector and the candidate mutation vectors "ratio" - uses a ratio between the values of the centroid vector and the mutation vector *

  190. def setTunerKSampleMutationValue(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter for specifying the mutation magnitude for the modes 'weighted' and 'ratio' in mutationMode

    Setter for specifying the mutation magnitude for the modes 'weighted' and 'ratio' in mutationMode

    value

    Double: value between 0 and 1 for mutation magnitude adjustment.

    returns

    this

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if the value specified is outside of the range (0, 1)

    Note

    the higher this value, the closer to the centroid vector vs. the candidate mutation vector the synthetic row data will be.

  191. def setTunerKSampleNumericRatio(value: Double): ConfigurationGenerator.this.type

    Permalink

    Setter - for specifying the percentage ratio for the mode 'percentage' in setLabelBalanceMode()

    Setter - for specifying the percentage ratio for the mode 'percentage' in setLabelBalanceMode()

    value

    Double: A fractional double in the range of 0.0 to 1.0.

    Annotations
    @throws( ... )
    Since

    0.5.1

    Exceptions thrown

    UnsupportedOperationException() if the provided value is outside of the range of 0.0 -> 1.0

    Note

    Default: 0.2

    ,

    Setting this value to 1.0 is equivalent to setting the label balance mode to 'match'

  192. def setTunerKSampleNumericTarget(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter - for specifying the target row count to generate for 'target' mode in setLabelBalanceMode()

    Setter - for specifying the target row count to generate for 'target' mode in setLabelBalanceMode()

    value

    Int: The desired final number of rows per minority class label

    Since

    0.5.1

    Note

    [WARNING] Setting this value to too high of a number will greatly increase runtime and memory pressure.

  193. def setTunerKSampleQuorumCount(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter for how many vectors to find in adjacency to the centroid for generation of synthetic data

    Setter for how many vectors to find in adjacency to the centroid for generation of synthetic data

    value

    Int: Number of vectors to find nearest each centroid within the class

    returns

    this

    Note

    the higher the value set here, the higher the variance in synthetic data generation

  194. def setTunerKSampleSyntheticCol(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter - for setting the name of the Synthetic column name

    Setter - for setting the name of the Synthetic column name

    value

    String: A column name that is uniquely not part of the main DataFrame

    Since

    0.5.1

  195. def setTunerKSampleVectorMutationMethod(value: String): ConfigurationGenerator.this.type

    Permalink

    Setter for the Vector Mutation Method

    Setter for the Vector Mutation Method

    value

    String - the mode to use.

    returns

    this

    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if the mode is not supported.

    Note

    Options: "fixed" - will use the value of minimumVectorCountToMutate to select random indexes of this number of indexes. "random" - will use this number as a lower bound on a random selection of indexes between this and the vector length. "all" - will mutate all of the vectors.

  196. def setTunerModelSeed(value: Map[String, Any]): ConfigurationGenerator.this.type

    Permalink
  197. def setTunerMutationMagnitudeMode(value: String): ConfigurationGenerator.this.type

    Permalink
  198. def setTunerNumberOfGenerations(value: Int): ConfigurationGenerator.this.type

    Permalink
  199. def setTunerNumberOfMutationsPerGeneration(value: Int): ConfigurationGenerator.this.type

    Permalink
  200. def setTunerNumberOfParentsToRetain(value: Int): ConfigurationGenerator.this.type

    Permalink
  201. def setTunerOutputDfRepartitionScaleFactor(value: Int): ConfigurationGenerator.this.type

    Permalink
  202. def setTunerParallelism(value: Int): ConfigurationGenerator.this.type

    Permalink
  203. def setTunerSeed(value: Long): ConfigurationGenerator.this.type

    Permalink
  204. def setTunerTrainPortion(value: Double): ConfigurationGenerator.this.type

    Permalink
  205. def setTunerTrainSplitChronologicalColumn(value: String): ConfigurationGenerator.this.type

    Permalink
  206. def setTunerTrainSplitChronologicalRandomPercentage(value: Double): ConfigurationGenerator.this.type

    Permalink
  207. def setTunerTrainSplitMethod(value: String): ConfigurationGenerator.this.type

    Permalink
  208. def setVarianceFilterFlag(value: Boolean): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for setting the state of varianceFilterFlag

    Boolean switch for setting the state of varianceFilterFlag

    value

    Boolean (whether or not to filter out fields from the feature vector that all have the same value)

  209. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  210. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  211. def varianceFilterOff(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning variance filtering off

    Boolean switch for turning variance filtering off

    Note

    Default: On

  212. def varianceFilterOn(): ConfigurationGenerator.this.type

    Permalink

    Boolean switch for turning variance filtering on

    Boolean switch for turning variance filtering on

    Note

    Default: On

  213. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  214. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  215. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def setFillConfigModelSelectionDistinctThreshold(value: Int): ConfigurationGenerator.this.type

    Permalink

    Setter
    The threshold value that is used to detect, based on the supplied labelCol, the cardinality of the label through a .distinct().count() being issued to the label column.

    Setter
    The threshold value that is used to detect, based on the supplied labelCol, the cardinality of the label through a .distinct().count() being issued to the label column. Values from this cardinality determination that are above this setter's value will be considered to be a Regression Task, those below will be considered a Classification Task.

    value

    Int: Threshold value for the labelCol cardinality check. Values above this setting will be determined to be a regression task; below to be a classification task.

    Annotations
    @deprecated
    Deprecated
    Note

    Default: 50

    ,

    In the case of exceptions being thrown for incorrect type (detected a classifier, but intended usage is for a regression, lower this value. Conversely, if a classification problem has a significant number of classes, above the default threshold of this setting (50), increase this value.)

Inherited from ConfigurationDefaults

Inherited from AnyRef

Inherited from Any

Ungrouped