Class

com.databricks.labs.automl.executor

DataPrep

Related Doc: package executor

Permalink

class DataPrep extends AutomationConfig with AutomationTools

Linear Supertypes
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataPrep
  2. AutomationTools
  3. SparkSessionWrapper
  4. Serializable
  5. Serializable
  6. AutomationConfig
  7. SanitizerDefaults
  8. Defaults
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataPrep(df: DataFrame)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final val _allowableEvolutionStrategies: List[String]

    Permalink
    Definition Classes
    Defaults
  5. final val _allowableInitialGenerationIndexMixingModes: List[String]

    Permalink
    Definition Classes
    Defaults
  6. final val _allowableInitialGenerationModes: List[String]

    Permalink
    Definition Classes
    Defaults
  7. final val _allowableMlFlowLoggingModes: List[String]

    Permalink
    Definition Classes
    Defaults
  8. final val _allowableNAFillModes: List[String]

    Permalink
    Definition Classes
    Defaults
  9. final val _allowedFilterDirections: Array[String]

    Permalink
    Definition Classes
    SanitizerDefaults
  10. final val _allowedFilterModes: Array[String]

    Permalink
    Definition Classes
    SanitizerDefaults
  11. final val _allowedStats: Array[String]

    Permalink

    Pearson Defaults

    Pearson Defaults

    Definition Classes
    SanitizerDefaults
  12. var _autoStoppingFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  13. var _autoStoppingScore: Double

    Permalink
    Definition Classes
    AutomationConfig
  14. var _cardinalityCheckMode: String

    Permalink
    Definition Classes
    AutomationConfig
  15. var _cardinalityLimit: Int

    Permalink
    Definition Classes
    AutomationConfig
  16. var _cardinalityPrecision: Double

    Permalink
    Definition Classes
    AutomationConfig
  17. var _cardinalitySwitchFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  18. var _cardinalityThreshold: Int

    Permalink
    Definition Classes
    AutomationConfig
  19. var _cardinalityType: String

    Permalink
    Definition Classes
    AutomationConfig
  20. var _categoricalNAFillMap: Map[String, String]

    Permalink
    Definition Classes
    AutomationConfig
  21. var _characterFillStat: String

    Permalink
    Definition Classes
    AutomationConfig
  22. var _characterNABlanketFillValue: String

    Permalink
    Definition Classes
    AutomationConfig
  23. var _continuousDataThreshold: Int

    Permalink
    Definition Classes
    AutomationConfig
  24. var _continuousEvolutionGeneticMixing: Double

    Permalink
    Definition Classes
    AutomationConfig
  25. var _continuousEvolutionImprovementThreshold: Int

    Permalink
    Definition Classes
    AutomationConfig
  26. var _continuousEvolutionMaxIterations: Int

    Permalink
    Definition Classes
    AutomationConfig
  27. var _continuousEvolutionMutationAggressiveness: Int

    Permalink
    Definition Classes
    AutomationConfig
  28. var _continuousEvolutionParallelism: Int

    Permalink
    Definition Classes
    AutomationConfig
  29. var _continuousEvolutionRollingImprovementCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  30. var _continuousEvolutionStoppingScore: Double

    Permalink
    Definition Classes
    AutomationConfig
  31. var _correlationCutoffHigh: Double

    Permalink
    Definition Classes
    AutomationConfig
  32. var _correlationCutoffLow: Double

    Permalink
    Definition Classes
    AutomationConfig
  33. var _covarianceConfig: CovarianceConfig

    Permalink
    Definition Classes
    AutomationConfig
  34. def _covarianceConfigDefaults: CovarianceConfig

    Permalink
    Definition Classes
    Defaults
  35. var _covarianceFilterFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  36. var _dataPrepCachingFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  37. def _dataPrepConfigDefaults: DataPrepConfig

    Permalink
    Definition Classes
    Defaults
  38. var _dataPrepParallelism: Int

    Permalink
    Definition Classes
    AutomationConfig
  39. var _dataReductionFactor: Double

    Permalink
    Definition Classes
    AutomationConfig
  40. var _dateTimeConversionType: String

    Permalink
    Definition Classes
    AutomationConfig
  41. def _defaultAutoStoppingFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  42. def _defaultAutoStoppingScore: Double

    Permalink
    Definition Classes
    Defaults
  43. def _defaultCovarianceFilterFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  44. def _defaultDataPrepCachingFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  45. def _defaultDataPrepParallelism: Int

    Permalink
    Definition Classes
    Defaults
  46. def _defaultDataReductionFactor: Double

    Permalink
    Definition Classes
    Defaults
  47. def _defaultDateTimeConversionType: String

    Permalink
    Definition Classes
    Defaults
  48. def _defaultFeatureImportanceCutoffType: String

    Permalink
    Definition Classes
    Defaults
  49. def _defaultFeatureImportanceCutoffValue: Double

    Permalink
    Definition Classes
    Defaults
  50. def _defaultFeatureInteractionConfig: FeatureInteractionConfig

    Permalink
    Definition Classes
    Defaults
  51. def _defaultFeatureInteractionFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  52. def _defaultFeaturesCol: String

    Permalink
    Definition Classes
    Defaults
  53. def _defaultFieldsToIgnoreInVector: Array[String]

    Permalink
    Definition Classes
    Defaults
  54. def _defaultFirstGenerationConfig: FirstGenerationConfig

    Permalink
    Definition Classes
    Defaults
  55. def _defaultHyperSpaceInference: Boolean

    Permalink
    Definition Classes
    Defaults
  56. def _defaultHyperSpaceInferenceCount: Int

    Permalink
    Definition Classes
    Defaults
  57. def _defaultHyperSpaceModelCount: Int

    Permalink
    Definition Classes
    Defaults
  58. def _defaultHyperSpaceModelType: String

    Permalink
    Definition Classes
    Defaults
  59. def _defaultInitialGenerationMode: String

    Permalink
    Definition Classes
    Defaults
  60. def _defaultKSampleConfig: KSampleConfig

    Permalink
    Definition Classes
    Defaults
  61. def _defaultLabelCol: String

    Permalink
    Definition Classes
    Defaults
  62. def _defaultMlFlowArtifactsFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  63. def _defaultMlFlowLoggingFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  64. def _defaultModelingFamily: String

    Permalink
    Definition Classes
    Defaults
  65. def _defaultNAFillFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  66. def _defaultOneHotEncodeFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  67. def _defaultOutlierFilterFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  68. def _defaultPearsonFilterFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  69. def _defaultPipelineDebugFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  70. def _defaultPipelineId: String

    Permalink
    Definition Classes
    Defaults
  71. def _defaultScalingFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  72. def _defaultVarianceFilterFlag: Boolean

    Permalink
    Definition Classes
    Defaults
  73. var _deltaCacheBackingDirectory: String

    Permalink
    Definition Classes
    AutomationConfig
  74. var _deltaCacheBackingDirectoryRemovalFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  75. var _evolutionStrategy: String

    Permalink
    Definition Classes
    AutomationConfig
  76. var _featureImportanceCutoffType: String

    Permalink
    Definition Classes
    AutomationConfig
  77. var _featureImportanceCutoffValue: Double

    Permalink
    Definition Classes
    AutomationConfig
  78. var _featureImportancesConfig: MainConfig

    Permalink
    Definition Classes
    AutomationConfig
  79. def _featureImportancesDefaults: MainConfig

    Permalink
    Definition Classes
    Defaults
  80. var _featureInteractionConfig: FeatureInteractionConfig

    Permalink
    Definition Classes
    AutomationConfig
  81. var _featureInteractionContinuousDiscretizerBucketCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  82. var _featureInteractionFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  83. var _featureInteractionParallelism: Int

    Permalink
    Definition Classes
    AutomationConfig
  84. var _featureInteractionRetentionMode: String

    Permalink
    Definition Classes
    AutomationConfig
  85. var _featureInteractionTargetInteractionPercentage: Double

    Permalink
    Definition Classes
    AutomationConfig
  86. var _featuresCol: String

    Permalink
    Definition Classes
    AutomationConfig
  87. var _fieldsToIgnore: Array[String]

    Permalink
    Definition Classes
    AutomationConfig
  88. var _fieldsToIgnoreInVector: Array[String]

    Permalink
    Definition Classes
    AutomationConfig
  89. var _fillConfig: FillConfig

    Permalink
    Definition Classes
    AutomationConfig
  90. def _fillConfigDefaults: FillConfig

    Permalink
    Definition Classes
    Defaults
  91. var _filterBounds: String

    Permalink
    Definition Classes
    AutomationConfig
  92. var _filterPrecision: Double

    Permalink
    Definition Classes
    AutomationConfig
  93. var _firstGenerationArraySeed: Long

    Permalink
    Definition Classes
    AutomationConfig
  94. var _firstGenerationConfig: FirstGenerationConfig

    Permalink
    Definition Classes
    AutomationConfig
  95. var _firstGenerationGenePool: Int

    Permalink
    Definition Classes
    AutomationConfig
  96. var _firstGenerationIndexMixingMode: String

    Permalink
    Definition Classes
    AutomationConfig
  97. var _firstGenerationMode: String

    Permalink
    Definition Classes
    AutomationConfig
  98. var _firstGenerationPermutationCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  99. var _fixedMutationValue: Int

    Permalink
    Definition Classes
    AutomationConfig
  100. def _gbtDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  101. def _gbtDefaultStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    Defaults
  102. var _generationalMutationStrategy: String

    Permalink
    Definition Classes
    AutomationConfig
  103. var _geneticConfig: GeneticConfig

    Permalink
    Definition Classes
    AutomationConfig
  104. var _geneticMBOCandidateFactor: Int

    Permalink
    Definition Classes
    AutomationConfig
  105. var _geneticMBORegressorType: String

    Permalink
    Definition Classes
    AutomationConfig
  106. var _geneticMixing: Double

    Permalink
    Definition Classes
    AutomationConfig
  107. def _geneticTunerDefaults: GeneticConfig

    Permalink
    Definition Classes
    Defaults
  108. var _hyperSpaceInference: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  109. var _hyperSpaceInferenceCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  110. var _hyperSpaceModelCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  111. var _hyperSpaceModelType: String

    Permalink
    Definition Classes
    AutomationConfig
  112. var _inferenceConfigSaveLocation: String

    Permalink
    Definition Classes
    AutomationConfig
  113. def _inferenceConfigSaveLocationDefault: String

    Permalink
    Definition Classes
    Defaults
  114. var _kFold: Int

    Permalink
    Definition Classes
    AutomationConfig
  115. var _kGroups: Int

    Permalink
    Definition Classes
    AutomationConfig
  116. var _kMeansDistanceMeasurement: String

    Permalink
    Definition Classes
    AutomationConfig
  117. var _kMeansMaxIter: Int

    Permalink
    Definition Classes
    AutomationConfig
  118. var _kMeansPredictionCol: String

    Permalink
    Definition Classes
    AutomationConfig
  119. var _kMeansSeed: Long

    Permalink
    Definition Classes
    AutomationConfig
  120. var _kMeansTolerance: Double

    Permalink
    Definition Classes
    AutomationConfig
  121. var _kSampleConfig: KSampleConfig

    Permalink
    Definition Classes
    AutomationConfig
  122. var _labelBalanceMode: String

    Permalink
    Definition Classes
    AutomationConfig
  123. var _labelCol: String

    Permalink
    Definition Classes
    AutomationConfig
  124. def _lightGBMDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  125. def _lightGBMDefaultStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    Defaults
  126. def _linearRegressionDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  127. def _linearRegressionDefaultStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    Defaults
  128. def _logisticRegressionDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  129. def _logisticRegressionDefaultStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    Defaults
  130. var _lowerFilterNTile: Double

    Permalink
    Definition Classes
    AutomationConfig
  131. var _lshHashTables: Int

    Permalink
    Definition Classes
    AutomationConfig
  132. var _lshOutputCol: String

    Permalink
    Definition Classes
    AutomationConfig
  133. var _lshSeed: Long

    Permalink
    Definition Classes
    AutomationConfig
  134. var _mainConfig: MainConfig

    Permalink
    Definition Classes
    AutomationConfig
  135. def _mainConfigDefaults: MainConfig

    Permalink
    Definition Classes
    Defaults
  136. var _minimumVectorCountToMutate: Int

    Permalink
    Definition Classes
    AutomationConfig
  137. var _mlFlowAPIToken: String

    Permalink
    Definition Classes
    AutomationConfig
  138. var _mlFlowArtifactsFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  139. var _mlFlowBestSuffix: String

    Permalink
    Definition Classes
    AutomationConfig
  140. var _mlFlowConfig: MLFlowConfig

    Permalink
    Definition Classes
    AutomationConfig
  141. def _mlFlowConfigDefaults: MLFlowConfig

    Permalink
    Definition Classes
    Defaults
  142. var _mlFlowCustomRunTags: Map[String, String]

    Permalink
    Definition Classes
    AutomationConfig
  143. var _mlFlowExperimentName: String

    Permalink
    Definition Classes
    AutomationConfig
  144. var _mlFlowLoggingFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  145. var _mlFlowLoggingMode: String

    Permalink
    Definition Classes
    AutomationConfig
  146. var _mlFlowModelSaveDirectory: String

    Permalink
    Definition Classes
    AutomationConfig
  147. var _mlFlowTrackingURI: String

    Permalink
    Definition Classes
    AutomationConfig
  148. def _mlpcDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  149. def _mlpcDefaultStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    Defaults
  150. var _modelSeedMap: Map[String, Any]

    Permalink
    Definition Classes
    AutomationConfig
  151. var _modelSeedSetStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  152. var _modelSelectionDistinctThreshold: Int

    Permalink
    Definition Classes
    AutomationConfig
  153. def _modelTypeDefault: String

    Permalink
    Definition Classes
    Defaults
  154. var _modelingFamily: String

    Permalink
    Definition Classes
    AutomationConfig
  155. var _mutationMagnitudeMode: String

    Permalink
    Definition Classes
    AutomationConfig
  156. var _mutationMode: String

    Permalink
    Definition Classes
    AutomationConfig
  157. var _mutationValue: Double

    Permalink
    Definition Classes
    AutomationConfig
  158. var _naFillFilterPrecision: Double

    Permalink
    Definition Classes
    AutomationConfig
  159. var _naFillFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  160. var _naFillMode: String

    Permalink
    Definition Classes
    AutomationConfig
  161. def _naiveBayesDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  162. def _naiveBayesDefaultStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    Defaults
  163. var _numberOfGenerations: Int

    Permalink
    Definition Classes
    AutomationConfig
  164. var _numberOfMutationsPerGeneration: Int

    Permalink
    Definition Classes
    AutomationConfig
  165. var _numberOfParentsToRetain: Int

    Permalink
    Definition Classes
    AutomationConfig
  166. var _numericBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    AutomationConfig
  167. var _numericFillStat: String

    Permalink
    Definition Classes
    AutomationConfig
  168. var _numericNABlanketFillValue: Double

    Permalink
    Definition Classes
    AutomationConfig
  169. var _numericNAFillMap: Map[String, AnyVal]

    Permalink
    Definition Classes
    AutomationConfig
  170. var _numericRatio: Double

    Permalink
    Definition Classes
    AutomationConfig
  171. var _numericTarget: Int

    Permalink
    Definition Classes
    AutomationConfig
  172. var _oneHotEncodeFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  173. var _outlierConfig: OutlierConfig

    Permalink
    Definition Classes
    AutomationConfig
  174. def _outlierConfigDefaults: OutlierConfig

    Permalink
    Definition Classes
    Defaults
  175. var _outlierFilterFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  176. var _outputDfRepartitionScaleFactor: Int

    Permalink
    Definition Classes
    AutomationConfig
  177. var _pNorm: Double

    Permalink
    Definition Classes
    AutomationConfig
  178. var _parallelism: Int

    Permalink
    Definition Classes
    AutomationConfig
  179. var _pearsonAutoFilterNTile: Double

    Permalink
    Definition Classes
    AutomationConfig
  180. var _pearsonConfig: PearsonConfig

    Permalink
    Definition Classes
    AutomationConfig
  181. def _pearsonConfigDefaults: PearsonConfig

    Permalink
    Definition Classes
    Defaults
  182. var _pearsonFilterDirection: String

    Permalink
    Definition Classes
    AutomationConfig
  183. var _pearsonFilterFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  184. var _pearsonFilterManualValue: Double

    Permalink
    Definition Classes
    AutomationConfig
  185. var _pearsonFilterMode: String

    Permalink
    Definition Classes
    AutomationConfig
  186. var _pearsonFilterStatistic: String

    Permalink
    Definition Classes
    AutomationConfig
  187. var _pipelineDebugFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  188. var _pipelineId: String

    Permalink
    Definition Classes
    AutomationConfig
  189. var _quorumCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  190. def _rfDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  191. def _rfDefaultStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    Defaults
  192. var _scalerMax: Double

    Permalink
    Definition Classes
    AutomationConfig
  193. var _scalerMin: Double

    Permalink
    Definition Classes
    AutomationConfig
  194. var _scalerType: String

    Permalink
    Definition Classes
    AutomationConfig
  195. var _scalingConfig: ScalingConfig

    Permalink
    Definition Classes
    AutomationConfig
  196. def _scalingConfigDefaults: ScalingConfig

    Permalink
    Definition Classes
    Defaults
  197. var _scalingFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  198. def _scoringDefaultClassifier: String

    Permalink
    Definition Classes
    Defaults
  199. def _scoringDefaultRegressor: String

    Permalink
    Definition Classes
    Defaults
  200. var _scoringMetric: String

    Permalink
    Definition Classes
    AutomationConfig
  201. var _scoringOptimizationStrategy: String

    Permalink
    Definition Classes
    AutomationConfig
  202. def _scoringOptimizationStrategyClassifier: String

    Permalink
    Definition Classes
    Defaults
  203. def _scoringOptimizationStrategyRegressor: String

    Permalink
    Definition Classes
    Defaults
  204. var _seed: Long

    Permalink
    Definition Classes
    AutomationConfig
  205. var _splitCachingStrategy: String

    Permalink
    Definition Classes
    AutomationConfig
  206. var _standardScalerMeanFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  207. var _standardScalerStdDevFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  208. var _stringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    AutomationConfig
  209. final val _supportedFeatureImportanceCutoffTypes: List[String]

    Permalink
    Definition Classes
    Defaults
  210. final val _supportedModels: Array[String]

    Permalink
    Definition Classes
    Defaults
  211. def _svmDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  212. def _svmDefaultStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    Defaults
  213. var _syntheticCol: String

    Permalink
    Definition Classes
    AutomationConfig
  214. var _trainPortion: Double

    Permalink
    Definition Classes
    AutomationConfig
  215. var _trainSplitChronologicalColumn: String

    Permalink
    Definition Classes
    AutomationConfig
  216. var _trainSplitChronologicalRandomPercentage: Double

    Permalink
    Definition Classes
    AutomationConfig
  217. var _trainSplitColumnSet: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  218. var _trainSplitMethod: String

    Permalink
    Definition Classes
    AutomationConfig
  219. def _treeSplitDefaults: MainConfig

    Permalink
    Definition Classes
    Defaults
  220. var _treeSplitsConfig: MainConfig

    Permalink
    Definition Classes
    AutomationConfig
  221. def _treesDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  222. def _treesDefaultStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    Defaults
  223. var _upperFilterNTile: Double

    Permalink
    Definition Classes
    AutomationConfig
  224. var _varianceFilterFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  225. var _vectorMutationMethod: String

    Permalink
    Definition Classes
    AutomationConfig
  226. def _xgboostDefaultNumBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    Defaults
  227. final val allowableCardinalilties: List[String]

    Permalink
    Definition Classes
    Defaults
  228. final val allowableCategoricalFilterModes: List[String]

    Permalink
    Definition Classes
    Defaults
  229. final val allowableDateTimeConversions: List[String]

    Permalink
    Definition Classes
    Defaults
  230. final val allowableFeatureInteractionModes: List[String]

    Permalink
    Definition Classes
    Defaults
  231. final val allowableKMeansDistanceMeasurements: List[String]

    Permalink
    Definition Classes
    Defaults
  232. final val allowableLabelBalanceModes: List[String]

    Permalink
    Definition Classes
    Defaults
  233. final val allowableMBORegressorTypes: List[String]

    Permalink
    Definition Classes
    Defaults
  234. final val allowableMutationModes: List[String]

    Permalink
    Definition Classes
    Defaults
  235. final val allowableScalers: Array[String]

    Permalink

    Scaler Defaults

    Scaler Defaults

    Definition Classes
    SanitizerDefaults
  236. final val allowableVectorMutationMethods: List[String]

    Permalink
    Definition Classes
    Defaults
  237. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  238. def autoStoppingOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  239. def autoStoppingOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  240. def cardinalitySwitchOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  241. def cardinalitySwitchOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  242. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  243. def covarianceFilterOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  244. def covarianceFilterOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  245. def dataPersist(preDF: DataFrame, postDF: DataFrame, cacheLevel: StorageLevel, blockUnpersist: Boolean): (DataFrame, String)

    Permalink
    Definition Classes
    AutomationTools
  246. def dataPrepCachingOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  247. def dataPrepCachingOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  248. def defaultFeaturesCol: String

    Permalink
    Definition Classes
    SanitizerDefaults
  249. def defaultLabelCol: String

    Permalink

    Global Defaults

    Global Defaults

    Definition Classes
    SanitizerDefaults
  250. def defaultPNorm: Double

    Permalink
    Definition Classes
    SanitizerDefaults
  251. def defaultPearsonAutoFilterNTile: Double

    Permalink
    Definition Classes
    SanitizerDefaults
  252. def defaultPearsonFilterDirection: String

    Permalink
    Definition Classes
    SanitizerDefaults
  253. def defaultPearsonFilterManualValue: Double

    Permalink
    Definition Classes
    SanitizerDefaults
  254. def defaultPearsonFilterMode: String

    Permalink
    Definition Classes
    SanitizerDefaults
  255. def defaultPearsonFilterStatistic: String

    Permalink
    Definition Classes
    SanitizerDefaults
  256. def defaultRenamedFeaturesCol: String

    Permalink
    Definition Classes
    SanitizerDefaults
  257. def defaultScalerMax: Double

    Permalink
    Definition Classes
    SanitizerDefaults
  258. def defaultScalerMin: Double

    Permalink
    Definition Classes
    SanitizerDefaults
  259. def defaultScalerType: String

    Permalink
    Definition Classes
    SanitizerDefaults
  260. def defaultStandardScalerMeanFlag: Boolean

    Permalink
    Definition Classes
    SanitizerDefaults
  261. def defaultStandardScalerStdDevFlag: Boolean

    Permalink
    Definition Classes
    SanitizerDefaults
  262. def deltaCheckBackingDirectoryRemovalOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  263. def deltaCheckBackingDirectoryRemovalOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  264. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  265. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  266. def extractGenerationData(payload: Array[GenericModelReturn]): Map[Int, (Double, Double)]

    Permalink
    Definition Classes
    AutomationTools
  267. def extractGenerationalScores(payload: Array[GenericModelReturn], scoringOptimizationStrategy: String, modelFamily: String, modelType: String): Array[GenerationalReport]

    Permalink
    Definition Classes
    AutomationTools
  268. def extractMLPCPayload(payload: MLPCConfig): Map[String, Any]

    Permalink
    Definition Classes
    AutomationTools
  269. def extractPayload(cc: Product): Map[String, Any]

    Permalink
    Definition Classes
    AutomationTools
  270. def featureInteractionOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  271. def featureInteractionOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  272. def fieldRemovalCompare(preFilterFields: Array[String], postFilterFields: Array[String]): List[String]

    Permalink
    Definition Classes
    AutomationTools
  273. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  274. def generationDataFrameReport(generationalData: Array[GenerationalReport], sortingStrategy: String): DataFrame

    Permalink
    Definition Classes
    AutomationTools
  275. def getAutoStoppingFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  276. def getAutoStoppingScore: Double

    Permalink
    Definition Classes
    AutomationConfig
  277. def getCardinalityCheckMode: String

    Permalink
    Definition Classes
    AutomationConfig
  278. def getCardinalityLimit: Int

    Permalink
    Definition Classes
    AutomationConfig
  279. def getCardinalityPrecision: Double

    Permalink
    Definition Classes
    AutomationConfig
  280. def getCardinalitySwitch: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  281. def getCardinalityType: String

    Permalink
    Definition Classes
    AutomationConfig
  282. def getCategoricalNAFillMap: Map[String, String]

    Permalink
    Definition Classes
    AutomationConfig
  283. def getCharacterFillStat: String

    Permalink
    Definition Classes
    AutomationConfig
  284. def getCharacterNABlanketFillValue: String

    Permalink
    Definition Classes
    AutomationConfig
  285. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  286. def getContinuousDataThreshold: Int

    Permalink
    Definition Classes
    AutomationConfig
  287. def getContinuousEvolutionGeneticMixing: Double

    Permalink
    Definition Classes
    AutomationConfig
  288. def getContinuousEvolutionMaxIterations: Int

    Permalink
    Definition Classes
    AutomationConfig
  289. def getContinuousEvolutionMutationAggressiveness: Int

    Permalink
    Definition Classes
    AutomationConfig
  290. def getContinuousEvolutionParallelism: Int

    Permalink
    Definition Classes
    AutomationConfig
  291. def getContinuousEvolutionRollingImporvementCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  292. def getContinuousEvolutionStoppingScore: Double

    Permalink
    Definition Classes
    AutomationConfig
  293. def getCorrelationCutoffHigh: Double

    Permalink
    Definition Classes
    AutomationConfig
  294. def getCorrelationCutoffLow: Double

    Permalink
    Definition Classes
    AutomationConfig
  295. def getCovarianceConfig: CovarianceConfig

    Permalink
    Definition Classes
    AutomationConfig
  296. def getCovarianceFilterStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  297. def getDataPrepCachingStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  298. def getDataPrepParallelism: Int

    Permalink
    Definition Classes
    AutomationConfig
  299. def getDataReductionFactor: Double

    Permalink
    Definition Classes
    AutomationConfig
  300. def getDateTimeConversionType: String

    Permalink
    Definition Classes
    AutomationConfig
  301. def getDeltaCacheBackingDirectory: String

    Permalink
    Definition Classes
    AutomationConfig
  302. def getDeltaCacheBackingDirectoryRemovalFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  303. def getEvolutionStrategy: String

    Permalink
    Definition Classes
    AutomationConfig
  304. def getFeatConfig: MainConfig

    Permalink
    Definition Classes
    AutomationConfig
  305. def getFeatureImportanceCutoffType: String

    Permalink
    Definition Classes
    AutomationConfig
  306. def getFeatureImportanceCutoffValue: Double

    Permalink
    Definition Classes
    AutomationConfig
  307. def getFeatureInteractionConfig: FeatureInteractionConfig

    Permalink
    Definition Classes
    AutomationConfig
  308. def getFeatureInteractionContinuousDiscretizerBucketCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  309. def getFeatureInteractionParallelism: Int

    Permalink
    Definition Classes
    AutomationConfig
  310. def getFeatureInteractionRetentionMode: String

    Permalink
    Definition Classes
    AutomationConfig
  311. def getFeatureInteractionStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  312. def getFeatureInteractionTargetInteractionPercentage: Double

    Permalink
    Definition Classes
    AutomationConfig
  313. def getFeaturesCol: String

    Permalink
    Definition Classes
    AutomationConfig
  314. def getFieldsToIgnore: Array[String]

    Permalink
    Definition Classes
    AutomationConfig
  315. def getFieldsToIgnoreInVector: Array[String]

    Permalink
    Definition Classes
    AutomationConfig
  316. def getFillConfig: FillConfig

    Permalink
    Definition Classes
    AutomationConfig
  317. def getFilterBounds: String

    Permalink
    Definition Classes
    AutomationConfig
  318. def getFilterPrecision: Double

    Permalink
    Definition Classes
    AutomationConfig
  319. def getFirstGenerationArraySeed: Long

    Permalink
    Definition Classes
    AutomationConfig
  320. def getFirstGenerationConfig: FirstGenerationConfig

    Permalink
    Definition Classes
    AutomationConfig
  321. def getFirstGenerationGenePool: Int

    Permalink
    Definition Classes
    AutomationConfig
  322. def getFirstGenerationIndexMixingMode: String

    Permalink
    Definition Classes
    AutomationConfig
  323. def getFirstGenerationMode: String

    Permalink
    Definition Classes
    AutomationConfig
  324. def getFirstGenerationPermutationCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  325. def getFixedMutationValue: Int

    Permalink
    Definition Classes
    AutomationConfig
  326. def getGenerationalMutationStrategy: String

    Permalink
    Definition Classes
    AutomationConfig
  327. def getGeneticConfig: GeneticConfig

    Permalink
    Definition Classes
    AutomationConfig
  328. def getGeneticMixing: Double

    Permalink
    Definition Classes
    AutomationConfig
  329. def getHyperSpaceInferenceCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  330. def getHyperSpaceInferenceStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  331. def getHyperSpaceModelCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  332. def getHyperSpaceModelType: String

    Permalink
    Definition Classes
    AutomationConfig
  333. def getInferenceConfigSaveLocation: String

    Permalink
    Definition Classes
    AutomationConfig
  334. def getKFold: Int

    Permalink
    Definition Classes
    AutomationConfig
  335. def getKGroups: Int

    Permalink
    Definition Classes
    AutomationConfig
  336. def getKMeansDistanceMeasurement: String

    Permalink
    Definition Classes
    AutomationConfig
  337. def getKMeansMaxIter: Int

    Permalink
    Definition Classes
    AutomationConfig
  338. def getKMeansPredictionCol: String

    Permalink
    Definition Classes
    AutomationConfig
  339. def getKMeansSeed: Long

    Permalink
    Definition Classes
    AutomationConfig
  340. def getKMeansTolerance: Double

    Permalink
    Definition Classes
    AutomationConfig
  341. def getKSampleConfig: KSampleConfig

    Permalink
    Definition Classes
    AutomationConfig
  342. def getLSHHashTables: Int

    Permalink
    Definition Classes
    AutomationConfig
  343. def getLSHOutputCol: String

    Permalink
    Definition Classes
    AutomationConfig
  344. def getLabelCol: String

    Permalink
    Definition Classes
    AutomationConfig
  345. def getLowerFilterNTile: Double

    Permalink
    Definition Classes
    AutomationConfig
  346. def getMainConfig: MainConfig

    Permalink
    Definition Classes
    AutomationConfig
  347. def getMinimumVectorCountToMutate: Int

    Permalink
    Definition Classes
    AutomationConfig
  348. def getMlFlowBestSuffix: String

    Permalink
    Definition Classes
    AutomationConfig
  349. def getMlFlowConfig: MLFlowConfig

    Permalink
    Definition Classes
    AutomationConfig
  350. def getMlFlowCustomRunTags: Map[String, String]

    Permalink
    Definition Classes
    AutomationConfig
  351. def getMlFlowExperimentName: String

    Permalink
    Definition Classes
    AutomationConfig
  352. def getMlFlowLogArtifactsFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  353. def getMlFlowLoggingFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  354. def getMlFlowLoggingMode: String

    Permalink
    Definition Classes
    AutomationConfig
  355. def getMlFlowModelSaveDirectory: String

    Permalink
    Definition Classes
    AutomationConfig
  356. def getMlFlowTrackingURI: String

    Permalink
    Definition Classes
    AutomationConfig
  357. def getModelSeedMap: Map[String, Any]

    Permalink
    Definition Classes
    AutomationConfig
  358. def getModelSeedSetStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  359. def getModelSelectionDistinctThreshold: Int

    Permalink
    Definition Classes
    AutomationConfig
  360. def getModelingFamily: String

    Permalink
    Definition Classes
    AutomationConfig
  361. def getMutationMagnitudeMode: String

    Permalink
    Definition Classes
    AutomationConfig
  362. def getMutationMode: String

    Permalink
    Definition Classes
    AutomationConfig
  363. def getMutationValue: Double

    Permalink
    Definition Classes
    AutomationConfig
  364. def getNAFillFilterPrecision: Double

    Permalink
    Definition Classes
    AutomationConfig
  365. def getNAFillMode: String

    Permalink
    Definition Classes
    AutomationConfig
  366. def getNaFillStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  367. def getNumberOfGenerations: Int

    Permalink
    Definition Classes
    AutomationConfig
  368. def getNumberOfMutationsPerGeneration: Int

    Permalink
    Definition Classes
    AutomationConfig
  369. def getNumberOfParentsToRetain: Int

    Permalink
    Definition Classes
    AutomationConfig
  370. def getNumericBoundaries: Map[String, (Double, Double)]

    Permalink
    Definition Classes
    AutomationConfig
  371. def getNumericFillStat: String

    Permalink
    Definition Classes
    AutomationConfig
  372. def getNumericNABlanketFillValue: Double

    Permalink
    Definition Classes
    AutomationConfig
  373. def getNumericNAFillMap: Map[String, AnyVal]

    Permalink
    Definition Classes
    AutomationConfig
  374. def getOneHotEncodingStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  375. def getOutlierConfig: OutlierConfig

    Permalink
    Definition Classes
    AutomationConfig
  376. def getOutlierFilterStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  377. def getPNorm: Double

    Permalink
    Definition Classes
    AutomationConfig
  378. def getParallelism: Int

    Permalink
    Definition Classes
    AutomationConfig
  379. def getPearsonAutoFilterNTile: Double

    Permalink
    Definition Classes
    AutomationConfig
  380. def getPearsonConfig: PearsonConfig

    Permalink
    Definition Classes
    AutomationConfig
  381. def getPearsonFilterDirection: String

    Permalink
    Definition Classes
    AutomationConfig
  382. def getPearsonFilterManualValue: Double

    Permalink
    Definition Classes
    AutomationConfig
  383. def getPearsonFilterMode: String

    Permalink
    Definition Classes
    AutomationConfig
  384. def getPearsonFilterStatistic: String

    Permalink
    Definition Classes
    AutomationConfig
  385. def getPearsonFilterStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  386. def getPipelineId: String

    Permalink
    Definition Classes
    AutomationConfig
  387. def getQuorumCount: Int

    Permalink
    Definition Classes
    AutomationConfig
  388. def getScalerMax: Double

    Permalink
    Definition Classes
    AutomationConfig
  389. def getScalerMin: Double

    Permalink
    Definition Classes
    AutomationConfig
  390. def getScalerType: String

    Permalink
    Definition Classes
    AutomationConfig
  391. def getScalingConfig: ScalingConfig

    Permalink
    Definition Classes
    AutomationConfig
  392. def getScalingStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  393. def getScoringMetric: String

    Permalink
    Definition Classes
    AutomationConfig
  394. def getScoringOptimizationStrategy: String

    Permalink
    Definition Classes
    AutomationConfig
  395. def getSeed: Long

    Permalink
    Definition Classes
    AutomationConfig
  396. def getSplitCachingStrategy: String

    Permalink
    Definition Classes
    AutomationConfig
  397. def getStandardScalingMeanFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  398. def getStandardScalingStdDevFlag: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  399. def getStringBoundaries: Map[String, List[String]]

    Permalink
    Definition Classes
    AutomationConfig
  400. def getSyntheticCol: String

    Permalink
    Definition Classes
    AutomationConfig
  401. def getTrainPortion: Double

    Permalink
    Definition Classes
    AutomationConfig
  402. def getTrainSplitChronologicalColumn: String

    Permalink
    Definition Classes
    AutomationConfig
  403. def getTrainSplitChronologicalRandomPercentage: Double

    Permalink
    Definition Classes
    AutomationConfig
  404. def getTrainSplitMethod: String

    Permalink
    Definition Classes
    AutomationConfig
  405. def getTreeSplitsConfig: MainConfig

    Permalink
    Definition Classes
    AutomationConfig
  406. def getUpperFilterNTile: Double

    Permalink
    Definition Classes
    AutomationConfig
  407. def getVarianceFilterStatus: Boolean

    Permalink
    Definition Classes
    AutomationConfig
  408. def getVectorMutationMethod: String

    Permalink
    Definition Classes
    AutomationConfig
  409. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  410. def hyperSpaceInferenceOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  411. def hyperSpaceInferenceOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  412. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  413. def mlFlowLogArtifactsOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  414. def mlFlowLogArtifactsOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  415. def mlFlowLoggingOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  416. def mlFlowLoggingOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  417. def naFillOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  418. def naFillOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  419. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  420. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  421. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  422. def oneHotEncodingOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  423. def oneHotEncodingOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  424. def outlierFilterOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  425. def outlierFilterOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  426. def pearsonFilterOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  427. def pearsonFilterOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  428. def prepData(): DataGeneration

    Permalink
  429. def prettyPrintConfig(config: AnyRef): String

    Permalink

    Provide a human-readable report into stdout and in the logs that show the configuration for a model run with the key -> value relationship shown as json

    Provide a human-readable report into stdout and in the logs that show the configuration for a model run with the key -> value relationship shown as json

    config

    AnyRef -> a defined case class

    returns

    String in the form of pretty print syntax

    Definition Classes
    AutomationTools
  430. def printSchema(schema: Array[String], dataName: String): String

    Permalink
    Definition Classes
    AutomationTools
  431. def printSchema(df: DataFrame, dataName: String): String

    Permalink
    Definition Classes
    AutomationTools
  432. def recordInferenceDataConfig(config: MainConfig, startingFields: Array[String]): InferenceDataConfig

    Permalink

    Helper method for generating the Inference Config object for the data configuration steps needed to perform to reproduce the modeling for subsequent inference runs.

    Helper method for generating the Inference Config object for the data configuration steps needed to perform to reproduce the modeling for subsequent inference runs.

    config

    The full main Config that is utilized for the execution of the run.

    startingFields

    The fields that are are returned from type casting and validation (may contain artificial suffixes for StringIndexer (_si) and OneHotEncoder(_oh). These will be removed before recording.

    returns

    and Instance of InferenceDataConfig

    Definition Classes
    AutomationTools
    Since

    0.4.0

  433. def recordInferenceSwitchSettings(config: MainConfig): InferenceSwitchSettings

    Permalink

    Single-pass method for recording all switch settings to the InferenceConfig Object.

    Single-pass method for recording all switch settings to the InferenceConfig Object.

    config

    MainConfig used for starting the training AutoML run

    Definition Classes
    AutomationTools
  434. lazy val sc: SparkContext

    Permalink
    Definition Classes
    SparkSessionWrapper
  435. def scalingOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  436. def scalingOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  437. def setAutoStoppingScore(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  438. def setCardinalityCheckMode(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[AssertionError] )
  439. def setCardinalityLimit(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
  440. def setCardinalityPrecision(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
  441. def setCardinalitySwitch(value: Boolean): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  442. def setCardinalityThreshold(value: Int): DataPrep.this.type

    Permalink

    Setter - for overriding the cardinality threshold exception threshold.

    Setter - for overriding the cardinality threshold exception threshold. [WARNING] increasing this value on a sufficiently large data set could incur, during runtime, excessive memory and cpu pressure on the cluster.

    value

    Int: the limit above which an exception will be thrown for a classification problem wherein the label distinct count is too large to successfully generate synthetic data.

    Definition Classes
    AutomationConfig
    Since

    0.5.1

    Note

    Default: 20

  443. def setCardinalityType(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[AssertionError] )
  444. def setCategoricalNAFillMap(value: Map[String, String]): DataPrep.this.type

    Permalink

    Setter for providing a map of [Column Name -> String Fill Value] for manual by-column overrides.

    Setter for providing a map of [Column Name -> String Fill Value] for manual by-column overrides. Any non-specified fields in this map will utilize the "auto" statistics-based fill paradigm to calculate and fill any NA values in non-numeric columns.

    value

    Map[String, String]: Column Name as String -> Fill Value as String

    Definition Classes
    AutomationConfig
    Since

    0.5.2

    Note

    If fields are specified in here that are not part of the DataFrame's schema, an exception will be thrown.

    ,

    if naFillMode is specified as using Map Fill modes, this setter or the numeric na fill map MUST be set.

  445. def setCharacterFillStat(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  446. def setCharacterNABlanketFillValue(value: String): DataPrep.this.type

    Permalink

    Setter for providing a 'blanket override' value (fill all found categorical columns' missing values with this specified value).

    Setter for providing a 'blanket override' value (fill all found categorical columns' missing values with this specified value).

    value

    String: A value to fill all categorical na values in the DataFrame with.

    Definition Classes
    AutomationConfig
    Since

    0.5.2

  447. def setContinuousDataThreshold(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  448. def setContinuousEvolutionGeneticMixing(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  449. def setContinuousEvolutionImprovementThreshold(value: Int): DataPrep.this.type

    Permalink

    Setter for defining the secondary stopping criteria for continuous training mode ( number of consistentlt not-improving runs to terminate the learning algorithm due to diminishing returns.

    Setter for defining the secondary stopping criteria for continuous training mode ( number of consistentlt not-improving runs to terminate the learning algorithm due to diminishing returns.

    value

    Negative Integer (an improvement to a priori will reset the counter and subsequent non-improvements will decrement a mutable counter. If the counter hits this limit specified in value, the continuous mode algorithm will stop).

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.0

    Exceptions thrown

    IllegalArgumentException if the value is positive.

  450. def setContinuousEvolutionMaxIterations(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  451. def setContinuousEvolutionMutationAggressiveness(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  452. def setContinuousEvolutionParallelism(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  453. def setContinuousEvolutionRollingImprovementCount(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  454. def setContinuousEvolutionStoppingScore(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  455. def setCorrelationCutoffHigh(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  456. def setCorrelationCutoffLow(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  457. def setDataPrepParallelism(value: Int): DataPrep.this.type

    Permalink

    Setter for defining the number of concurrent threads allocated to performing asynchronous data prep tasks within the feature engineering aspect of this application.

    Setter for defining the number of concurrent threads allocated to performing asynchronous data prep tasks within the feature engineering aspect of this application.

    value

    Int: A value that must be greater than zero.

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.0

    Exceptions thrown

    IllegalArgumentException if a value less than or equal to zero is supplied.

    Note

    This value has an upper limit, depending on driver size, that will restrict the efficacy of the asynchronous tasks within the pool. Setting this too high may cause cluster instability.

  458. def setDataReductionFactor(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  459. def setDateTimeConversionType(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  460. def setDeltaCacheBackingDirectory(value: String): DataPrep.this.type

    Permalink

    Setter for providing a path to write the kfold train/test splits as Delta data sets to (useful for extremely large data sets or a situation where using local disk storage might be prohibitively expensive)

    Setter for providing a path to write the kfold train/test splits as Delta data sets to (useful for extremely large data sets or a situation where using local disk storage might be prohibitively expensive)

    value

    String path to a dbfs location for creating the temporary (or persisted)

    Definition Classes
    AutomationConfig
    Since

    0.7.1

  461. def setDeltaCacheBackingDirectoryRemovalFlag(value: Boolean): DataPrep.this.type

    Permalink

    Setter for whether or not to delete the written train/test splits for the run in Delta.

    Setter for whether or not to delete the written train/test splits for the run in Delta. Defaulted to true which means that the job will delete the data on Object store to clean itself up after the run is completed if the splitCachingStrategy is set to 'delta'

    value

    Boolean - true => delete false => leave on Object Store

    Definition Classes
    AutomationConfig
    Since

    0.7.1

  462. def setEvolutionStrategy(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  463. def setFeatConfig(value: MainConfig): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  464. def setFeatConfig(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  465. def setFeatureImportanceCutoffType(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  466. def setFeatureImportanceCutoffValue(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  467. def setFeatureInteractionContinuousDiscretizerBucketCount(value: Int): DataPrep.this.type

    Permalink

    Setter for determining the behavior of continuous feature columns.

    Setter for determining the behavior of continuous feature columns. In order to calculate Entropy for a continuous variable, the distribution must be converted to nominal values for estimation of per-split information gain. This setting defines how many nominal categorical values to create out of a continuously distributed feature in order to calculate Entropy.

    value

    Int -> must be greater than 1

    Definition Classes
    AutomationConfig
    Since

    0.6.2

    Exceptions thrown

    IllegalArgumentException if the value specified is <= 1

  468. def setFeatureInteractionParallelism(value: Int): DataPrep.this.type

    Permalink

    Setter for configuring the concurrent count for scoring of feature interaction candidates.

    Setter for configuring the concurrent count for scoring of feature interaction candidates. Due to the nature of these operations, the configuration here may need to be set differently to that of the modeling and general feature engineering phases of the toolkit. This is highly dependent on the row count of the data set being submitted.

    value

    Int -> must be greater than 0

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.2

    Exceptions thrown

    IllegalArgumentException if the value is < 1

  469. def setFeatureInteractionRetentionMode(value: String): DataPrep.this.type

    Permalink

    Setter for determining the mode of operation for inclusion of interacted features.

    Setter for determining the mode of operation for inclusion of interacted features. Modes are:

    • all -> Includes all interactions between all features (after string indexing of categorical values)
    • optimistic -> If the Information Gain / Variance, as compared to at least ONE of the parents of the interaction is above the threshold set by featureInteractionTargetInteractionPercentage (e.g. if IG of left parent is 0.5 and right parent is 0.9, with threshold set at 10, if the interaction between these two parents has an IG of 0.42, it would be rejected, but if it was 0.46, it would be kept)
    • strict -> the threshold percentage must be met for BOTH parents. (in the above example, the IG for the interaction would have to be > 0.81 in order to be included in the feature vector).
    value

    String -> one of: 'all', 'optimistic', or 'strict'

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.2

    Exceptions thrown

    IllegalArgumentException if the specified value submitted is not permitted

  470. def setFeatureInteractionTargetInteractionPercentage(value: Double): DataPrep.this.type

    Permalink

    Setter for establishing the minimum acceptable InformationGain or Variance allowed for an interaction candidate based on comparison to the scores of its parents.

    Setter for establishing the minimum acceptable InformationGain or Variance allowed for an interaction candidate based on comparison to the scores of its parents.

    value

    Double in range of -inf -> inf

    Definition Classes
    AutomationConfig
    Since

    0.6.2

  471. def setFeaturesCol(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  472. def setFieldsToIgnore(value: Array[String]): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  473. def setFieldsToIgnoreInVector(value: Array[String]): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  474. def setFilterBounds(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  475. def setFilterPrecision(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  476. def setFirstGenerationArraySeed(value: Long): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  477. def setFirstGenerationGenePool(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  478. def setFirstGenerationIndexMixingMode(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  479. def setFirstGenerationMode(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  480. def setFirstGenerationPermutationCount(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  481. def setFixedMutationValue(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  482. def setGenerationalMutationStrategy(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  483. def setGeneticMBOCandidateFactor(value: Int): DataPrep.this.type

    Permalink

    Setter for defining the factor to be applied to the candidate listing of hyperparameters to generate through mutation for each generation other than the initial and post-modeling optimization phases.

    Setter for defining the factor to be applied to the candidate listing of hyperparameters to generate through mutation for each generation other than the initial and post-modeling optimization phases. The larger this value (default: 10), the more potential space can be searched. There is not a large performance hit to this, and as such, values in excess of 100 are viable.

    value

    Int - a factor to multiply the numberOfMutationsPerGeneration by to generate a count of potential candidates.

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.0

    Exceptions thrown

    IllegalArgumentException if the value is not greater than zero.

  484. def setGeneticMBORegressorType(value: String): DataPrep.this.type

    Permalink

    Setter for selecting the type of Regressor to use for the within-epoch generation MBO of candidates

    Setter for selecting the type of Regressor to use for the within-epoch generation MBO of candidates

    value

    String - one of "XGBoost", "LinearRegression" or "RandomForest"

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.6.0

    Exceptions thrown

    IllegalArgumentException if the value is not supported

  485. def setGeneticMixing(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  486. def setHyperSpaceInferenceCount(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  487. def setHyperSpaceModelCount(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  488. def setHyperSpaceModelType(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  489. def setInferenceConfigSaveLocation(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
  490. def setKFold(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  491. def setKGroups(value: Int): DataPrep.this.type

    Permalink

    Setter for specifying the number of K-Groups to generate in the KMeans model

    Setter for specifying the number of K-Groups to generate in the KMeans model

    value

    Int: number of k groups to generate

    returns

    this

    Definition Classes
    AutomationConfig
  492. def setKMeansDistanceMeasurement(value: String): DataPrep.this.type

    Permalink

    Setter for which distance measurement to use to calculate the nearness of vectors to a centroid

    Setter for which distance measurement to use to calculate the nearness of vectors to a centroid

    value

    String: Options -> "euclidean" or "cosine" Default: "euclidean"

    returns

    this

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if an invalid value is entered

  493. def setKMeansMaxIter(value: Int): DataPrep.this.type

    Permalink

    Setter for specifying the maximum number of iterations for the KMeans model to go through to converge

    Setter for specifying the maximum number of iterations for the KMeans model to go through to converge

    value

    Int: Maximum limit on iterations

    returns

    this

    Definition Classes
    AutomationConfig
  494. def setKMeansPredictionCol(value: String): DataPrep.this.type

    Permalink

    Setter for the internal KMeans column for cluster membership attribution

    Setter for the internal KMeans column for cluster membership attribution

    value

    String: column name for internal algorithm column for group membership

    returns

    this

    Definition Classes
    AutomationConfig
  495. def setKMeansSeed(value: Long): DataPrep.this.type

    Permalink

    Setter for a KMeans seed for the clustering algorithm

    Setter for a KMeans seed for the clustering algorithm

    value

    Long: Seed value

    returns

    this

    Definition Classes
    AutomationConfig
  496. def setKMeansTolerance(value: Double): DataPrep.this.type

    Permalink

    Setter for Setting the tolerance for KMeans (must be >0)

    Setter for Setting the tolerance for KMeans (must be >0)

    value

    The tolerance value setting for KMeans

    returns

    this

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if a value less than 0 is entered

    See also

    reference: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.clustering.KMeans for further details.

  497. def setKSampleConfig(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  498. def setLSHHashTables(value: Int): DataPrep.this.type

    Permalink

    Setter for Configuring the number of Hash Tables to use for MinHashLSH

    Setter for Configuring the number of Hash Tables to use for MinHashLSH

    value

    Int: Count of hash tables to use

    returns

    this

    Definition Classes
    AutomationConfig
    See also

    http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.MinHashLSH for more information

  499. def setLSHOutputCol(value: String): DataPrep.this.type

    Permalink

    Setter for the internal LSH output hash information column

    Setter for the internal LSH output hash information column

    value

    String: column name for the internal MinHashLSH Model transformation value

    returns

    this

    Definition Classes
    AutomationConfig
  500. def setLSHSeed(value: Long): DataPrep.this.type

    Permalink

    Setter for Configuring the Seed value for the LSH MinHash model

    Setter for Configuring the Seed value for the LSH MinHash model

    value

    Long: A Seed value

    Definition Classes
    AutomationConfig
    Since

    0.5.1

  501. def setLabelBalanceMode(value: String): DataPrep.this.type

    Permalink

    Setter - for determining the label balance approach mode.

    Setter - for determining the label balance approach mode.

    value

    String: one of: 'match', 'percentage' or 'target'

    Definition Classes
    AutomationConfig
    Annotations
    @throws( ... )
    Since

    0.5.1

    Exceptions thrown

    UnsupportedOperationException() if the provided mode is not supported.

    Note

    Default: "percentage"

    ,

    Available modes:
    'match': Will match all smaller class counts to largest class count. [WARNING] - May significantly increase memory pressure!
    'percentage' Will adjust smaller classes to a percentage value of the largest class count. 'target' Will increase smaller class counts to a fixed numeric target of rows.

  502. def setLabelCol(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  503. def setLowerFilterNTile(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  504. def setMainConfig(value: MainConfig): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  505. def setMainConfig(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  506. def setMinimumVectorCountToMutate(value: Int): DataPrep.this.type

    Permalink

    Setter for minimum threshold for vector indexes to mutate within the feature vector.

    Setter for minimum threshold for vector indexes to mutate within the feature vector.

    value

    The minimum (or fixed) number of indexes to mutate.

    returns

    this

    Definition Classes
    AutomationConfig
    Note

    In vectorMutationMethod "fixed" this sets the fixed count of how many vector positions to mutate. In vectorMutationMethod "random" this sets the lower threshold for 'at least this many indexes will be mutated'

  507. def setMlFlowAPIToken(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  508. def setMlFlowBestSuffix(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  509. def setMlFlowConfig(value: MLFlowConfig): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  510. def setMlFlowCustomRunTags(value: Map[String, String]): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  511. def setMlFlowExperimentName(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  512. def setMlFlowLoggingMode(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  513. def setMlFlowModelSaveDirectory(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
  514. def setMlFlowTrackingURI(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  515. def setModelSeedMap(value: Map[String, Any]): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  516. def setModelSeedString(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  517. def setModelSelectionDistinctThreshold(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  518. def setModelingFamily(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  519. def setMutationMagnitudeMode(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  520. def setMutationMode(value: String): DataPrep.this.type

    Permalink

    Setter for the Mutation Mode of the feature vector individual values

    Setter for the Mutation Mode of the feature vector individual values

    value

    String: the mode to use.

    returns

    this

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if the mode is not supported.

    Note

    Options: "weighted" - uses weighted averaging to scale the euclidean distance between the centroid vector and mutation candidate vectors "random" - randomly selects a position on the euclidean vector between the centroid vector and the candidate mutation vectors "ratio" - uses a ratio between the values of the centroid vector and the mutation vector *

  521. def setMutationValue(value: Double): DataPrep.this.type

    Permalink

    Setter for specifying the mutation magnitude for the modes 'weighted' and 'ratio' in mutationMode

    Setter for specifying the mutation magnitude for the modes 'weighted' and 'ratio' in mutationMode

    value

    Double: value between 0 and 1 for mutation magnitude adjustment.

    returns

    this

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if the value specified is outside of the range (0, 1)

    Note

    the higher this value, the closer to the centroid vector vs. the candidate mutation vector the synthetic row data will be.

  522. def setNAFillFilterPrecision(value: Double): DataPrep.this.type

    Permalink

    Setter for defining the precision for calculating the model type as per the label column

    Setter for defining the precision for calculating the model type as per the label column

    value

    Double: Precision accuracy for approximate distinct calculation.

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[AssertionError] )
    Since

    0.5.2

    Exceptions thrown

    java.lang.AssertionError If the value is outside of the allowable range of {0, 1}

    Note

    setting this value to zero (0) for a large regression problem will incur a long processing time and an expensive shuffle.

  523. def setNAFillMode(value: String): DataPrep.this.type

    Permalink

    Mode for na fill
    Available modes:
    auto : Stats-based na fill for fields.

    Mode for na fill
    Available modes:
    auto : Stats-based na fill for fields. Usage of .setNumericFillStat and .setCharacterFillStat will inform the type of statistics that will be used to fill.
    mapFill : Custom by-column overrides to 'blanket fill' na values on a per-column basis. The categorical (string) fields are set via .setCategoricalNAFillMap while the numeric fields are set via .setNumericNAFillMap.
    blanketFillAll : Fills all fields based on the values specified by .setCharacterNABlanketFillValue and .setNumericNABlanketFillValue. All NA's for the appropriate types will be filled in accordingly throughout all columns.
    blanketFillCharOnly Will use statistics to fill in numeric fields, but will replace all categorical character fields na values with a blanket fill value.
    blanketFillNumOnly Will use statistics to fill in character fields, but will replace all numeric fields na values with a blanket value.

    value

    String: Mode for NA Fill

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Since

    0.5.2

    Exceptions thrown

    IllegalArgumentException if the mods specified is not supported.

  524. def setNumberOfGenerations(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  525. def setNumberOfMutationsPerGeneration(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  526. def setNumberOfParentsToRetain(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  527. def setNumericBoundaries(value: Map[String, (Double, Double)]): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  528. def setNumericFillStat(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  529. def setNumericNABlanketFillValue(value: Double): DataPrep.this.type

    Permalink

    Setter for providing a 'blanket override' value (fill all found numeric columns' missing values with this specified value)

    Setter for providing a 'blanket override' value (fill all found numeric columns' missing values with this specified value)

    value

    Double: A value to fill all numeric na value in the DataFrame with.

    Definition Classes
    AutomationConfig
    Since

    0.5.2

  530. def setNumericNAFillMap(value: Map[String, AnyVal]): DataPrep.this.type

    Permalink

    Setter for providing a map of [Column Name -> AnyVal Fill Value] (must be numeric).

    Setter for providing a map of [Column Name -> AnyVal Fill Value] (must be numeric). Any non-specified fields in this map will utilize the "auto" statistics-based fill paradigm to calculate and fill any NA values in numeric columns.

    value

    Map[String, AnyVal]: Column Name as String -> Fill Numeric Type Value

    Definition Classes
    AutomationConfig
    Since

    0.5.2

    Note

    If fields are specified in here that are not part of the DataFrame's schema, an exception will be thrown.

    ,

    if naFillMode is specified as using Map Fill modes, this setter or the categorical na fill map MUST be set.

  531. def setNumericRatio(value: Double): DataPrep.this.type

    Permalink

    Setter - for specifying the percentage ratio for the mode 'percentage' in setLabelBalanceMode()

    Setter - for specifying the percentage ratio for the mode 'percentage' in setLabelBalanceMode()

    value

    Double: A fractional double in the range of 0.0 to 1.0.

    Definition Classes
    AutomationConfig
    Annotations
    @throws( ... )
    Since

    0.5.1

    Exceptions thrown

    UnsupportedOperationException() if the provided value is outside of the range of 0.0 -> 1.0

    Note

    Default: 0.2

    ,

    Setting this value to 1.0 is equivalent to setting the label balance mode to 'match'

  532. def setNumericTarget(value: Int): DataPrep.this.type

    Permalink

    Setter - for specifying the target row count to generate for 'target' mode in setLabelBalanceMode()

    Setter - for specifying the target row count to generate for 'target' mode in setLabelBalanceMode()

    value

    Int: The desired final number of rows per minority class label

    Definition Classes
    AutomationConfig
    Since

    0.5.1

    Note

    [WARNING] Setting this value to too high of a number will greatly increase runtime and memory pressure.

  533. def setPNorm(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  534. def setParallelism(value: Int): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  535. def setPearsonAutoFilterNTile(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  536. def setPearsonFilterDirection(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  537. def setPearsonFilterManualValue(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  538. def setPearsonFilterMode(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  539. def setPearsonFilterStatistic(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  540. def setPipelineId(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  541. def setQuorumCount(value: Int): DataPrep.this.type

    Permalink

    Setter for how many vectors to find in adjacency to the centroid for generation of synthetic data

    Setter for how many vectors to find in adjacency to the centroid for generation of synthetic data

    value

    Int: Number of vectors to find nearest each centroid within the class

    returns

    this

    Definition Classes
    AutomationConfig
    Note

    the higher the value set here, the higher the variance in synthetic data generation

  542. def setScalerMax(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  543. def setScalerMin(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  544. def setScalerType(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  545. def setScoringMetric(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  546. def setScoringOptimizationStrategy(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  547. def setSeed(value: Long): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  548. def setSplitCachingStrategy(value: String): DataPrep.this.type

    Permalink

    Setter for determining the split caching strategy (either persist to disk for each kfold split or backing to Delta)

    Setter for determining the split caching strategy (either persist to disk for each kfold split or backing to Delta)

    value

    Configuration string either 'persist' or 'delta'

    Definition Classes
    AutomationConfig
    Since

    0.7.1

  549. def setStandardScalerMeanFlagOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  550. def setStandardScalerMeanFlagOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  551. def setStandardScalerStdDevFlagOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  552. def setStandardScalerStdDevFlagOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  553. def setStringBoundaries(value: Map[String, List[String]]): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  554. def setSyntheticCol(value: String): DataPrep.this.type

    Permalink

    Setter - for setting the name of the Synthetic column name

    Setter - for setting the name of the Synthetic column name

    value

    String: A column name that is uniquely not part of the main DataFrame

    Definition Classes
    AutomationConfig
    Since

    0.5.1

  555. def setTrainPortion(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  556. def setTrainSplitChronologicalColumn(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  557. def setTrainSplitChronologicalRandomPercentage(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  558. def setTrainSplitMethod(value: String): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  559. def setTreeSplitsConfig(value: MainConfig): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  560. def setTreeSplitsConfig(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  561. def setUpperFilterNTile(value: Double): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  562. def setVectorMutationMethod(value: String): DataPrep.this.type

    Permalink

    Setter for the Vector Mutation Method

    Setter for the Vector Mutation Method

    value

    String - the mode to use.

    returns

    this

    Definition Classes
    AutomationConfig
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if the mode is not supported.

    Note

    Options: "fixed" - will use the value of minimumVectorCountToMutate to select random indexes of this number of indexes. "random" - will use this number as a lower bound on a random selection of indexes between this and the vector length. "all" - will mutate all of the vectors.

  563. lazy val spark: SparkSession

    Permalink
    Definition Classes
    SparkSessionWrapper
  564. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  565. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  566. final val trainSplitMethods: List[String]

    Permalink
    Definition Classes
    Defaults
  567. def trainSplitValidation(trainSplitMethod: String, modelSelection: String): String

    Permalink
    Definition Classes
    AutomationTools
  568. def varianceFilterOff(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  569. def varianceFilterOn(): DataPrep.this.type

    Permalink
    Definition Classes
    AutomationConfig
  570. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  571. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  572. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AutomationTools

Inherited from SparkSessionWrapper

Inherited from Serializable

Inherited from Serializable

Inherited from AutomationConfig

Inherited from SanitizerDefaults

Inherited from Defaults

Inherited from AnyRef

Inherited from Any

Ungrouped