Class/Object

com.databricks.labs.automl.feature

KSampling

Related Docs: object KSampling | package feature

Permalink

class KSampling extends KSamplingBase

Linear Supertypes
KSamplingBase, SparkSessionWrapper, Serializable, Serializable, KSamplingDefaults, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. KSampling
  2. KSamplingBase
  3. SparkSessionWrapper
  4. Serializable
  5. Serializable
  6. KSamplingDefaults
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new KSampling(df: DataFrame)

    Permalink

Type Members

  1. case class MapTypeVal(colName: String, colValue: Column) extends Product with Serializable

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def defaultFeaturesCol: String

    Permalink
    Definition Classes
    KSamplingDefaults
  7. def defaultFieldsToIgnore: Array[String]

    Permalink
    Definition Classes
    KSamplingDefaults
  8. def defaultFill: Map[DataType, Any]

    Permalink
    Definition Classes
    KSamplingDefaults
  9. def defaultHashTables: Int

    Permalink
    Definition Classes
    KSamplingDefaults
  10. def defaultKGroups: Int

    Permalink
    Definition Classes
    KSamplingDefaults
  11. def defaultKMeansDistanceMeasurement: String

    Permalink
    Definition Classes
    KSamplingDefaults
  12. def defaultKMeansMaxIter: Int

    Permalink
    Definition Classes
    KSamplingDefaults
  13. def defaultKMeansPredictionCol: String

    Permalink
    Definition Classes
    KSamplingDefaults
  14. def defaultKMeansSeed: Long

    Permalink
    Definition Classes
    KSamplingDefaults
  15. def defaultKMeansTolerance: Double

    Permalink
    Definition Classes
    KSamplingDefaults
  16. def defaultLSHOutputCol: String

    Permalink
    Definition Classes
    KSamplingDefaults
  17. def defaultLSHSeed: Long

    Permalink
    Definition Classes
    KSamplingDefaults
  18. def defaultLabelCol: String

    Permalink
    Definition Classes
    KSamplingDefaults
  19. def defaultMinimumVectorCountToMutate: Int

    Permalink
    Definition Classes
    KSamplingDefaults
  20. def defaultMutationMode: String

    Permalink
    Definition Classes
    KSamplingDefaults
  21. def defaultMutationValue: Double

    Permalink
    Definition Classes
    KSamplingDefaults
  22. def defaultQuorumCount: Int

    Permalink
    Definition Classes
    KSamplingDefaults
  23. def defaultSyntheticCol: String

    Permalink
    Definition Classes
    KSamplingDefaults
  24. def defaultVectorMutationMethod: String

    Permalink
    Definition Classes
    KSamplingDefaults
  25. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  26. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  27. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  28. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  29. def getKSamplingConfig: KSamplingConfiguration

    Permalink

    Public method for returning the current state of the configuration as a new instance of the KSamplingConfiguration

    Public method for returning the current state of the configuration as a new instance of the KSamplingConfiguration

    returns

    the current state of the KSamplingConfiguration conf

    Definition Classes
    KSamplingBase
  30. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  31. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  32. def makeRows(labelValues: Array[RowGenerationConfig]): DataFrame

    Permalink

    Main Method for generating synthetic data

    Main Method for generating synthetic data

    labelValues

    Array[RowGenerationConfig] for specifying which categorical labels and the target counts to generate data for

    returns

    A synthetic data DataFrame with an added field for specifying that this data is synthetic in nature.

  33. def mutateValueFixed(first: Double, second: Double, mutationValue: Double): Double

    Permalink

    Method for mutating between two row values of all features in the rows.

    Method for mutating between two row values of all features in the rows. A mutation value is set to provide a ratio between the min and max values.

    first

    a value to mix with the second variable

    second

    a value to mix with the first variable

    mutationValue

    the ratio of mixing between the two variables.

    returns

    The scaled value between first and second

    Since

    0.5.1

  34. def mutateValueRandom(first: Double, second: Double): Double

    Permalink

    Method for randomly mutating between the bounds of two values

    Method for randomly mutating between the bounds of two values

    first

    a value to mix

    second

    a value to mix

    returns

    the randomly mutated value

    Since

    0.5.1

  35. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  36. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  37. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  38. def ratioValueFixed(first: Double, second: Double, mutationValue: Double): Double

    Permalink

    Method for mutating between row values with a fixed ratio value

    Method for mutating between row values with a fixed ratio value

    first

    a value to mix

    second

    a value to mix

    mutationValue

    ratio modifier between the two values

    returns

    the mutated value

    Since

    0.5.1

  39. lazy val sc: SparkContext

    Permalink
    Definition Classes
    SparkSessionWrapper
  40. def scalaToSparkTypeConversion(scalaType: String): DataType

    Permalink
  41. def setFeaturesCol(value: String): KSampling.this.type

    Permalink

    Setter for the Feature Column name of the input DataFrame

    Setter for the Feature Column name of the input DataFrame

    value

    String: name of the feature vector column

    returns

    this

    Definition Classes
    KSamplingBase
  42. def setFieldsToIgnore(value: Array[String]): KSampling.this.type

    Permalink

    Setter to provide a listing of any fields that are intended to be ignored in the generated dataframe

    Setter to provide a listing of any fields that are intended to be ignored in the generated dataframe

    value

    Array[String]: field names to ignore in the data generation aspect

    returns

    this

    Definition Classes
    KSamplingBase
  43. def setKGroups(value: Int): KSampling.this.type

    Permalink

    Setter for specifying the number of K-Groups to generate in the KMeans model

    Setter for specifying the number of K-Groups to generate in the KMeans model

    value

    Int: number of k groups to generate

    returns

    this

    Definition Classes
    KSamplingBase
  44. def setKMeansDistanceMeasurement(value: String): KSampling.this.type

    Permalink

    Setter for which distance measurement to use to calculate the nearness of vectors to a centroid

    Setter for which distance measurement to use to calculate the nearness of vectors to a centroid

    value

    String: Options -> "euclidean" or "cosine" Default: "euclidean"

    returns

    this

    Definition Classes
    KSamplingBase
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if an invalid value is entered

  45. def setKMeansMaxIter(value: Int): KSampling.this.type

    Permalink

    Setter for specifying the maximum number of iterations for the KMeans model to go through to converge

    Setter for specifying the maximum number of iterations for the KMeans model to go through to converge

    value

    Int: Maximum limit on iterations

    returns

    this

    Definition Classes
    KSamplingBase
  46. def setKMeansPredictionCol(value: String): KSampling.this.type

    Permalink

    Setter for the internal KMeans column for cluster membership attribution

    Setter for the internal KMeans column for cluster membership attribution

    value

    String: column name for internal algorithm column for group membership

    returns

    this

    Definition Classes
    KSamplingBase
  47. def setKMeansSeed(value: Long): KSampling.this.type

    Permalink

    Setter for a KMeans seed for the clustering algorithm

    Setter for a KMeans seed for the clustering algorithm

    value

    Long: Seed value

    returns

    this

    Definition Classes
    KSamplingBase
  48. def setKMeansTolerance(value: Double): KSampling.this.type

    Permalink

    Setter for Setting the tolerance for KMeans (must be >0)

    Setter for Setting the tolerance for KMeans (must be >0)

    value

    The tolerance value setting for KMeans

    returns

    this

    Definition Classes
    KSamplingBase
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if a value less than 0 is entered

    See also

    reference: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.clustering.KMeans for further details.

  49. def setLSHHashTables(value: Int): KSampling.this.type

    Permalink

    Setter for Configuring the number of Hash Tables to use for MinHashLSH

    Setter for Configuring the number of Hash Tables to use for MinHashLSH

    value

    Int: Count of hash tables to use

    returns

    this

    Definition Classes
    KSamplingBase
    See also

    http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.MinHashLSH for more information

  50. def setLSHOutputCol(value: String): KSampling.this.type

    Permalink

    Setter for the internal LSH output hash information column

    Setter for the internal LSH output hash information column

    value

    String: column name for the internal MinHashLSH Model transformation value

    returns

    this

    Definition Classes
    KSamplingBase
  51. def setLSHSeed(value: Long): KSampling.this.type

    Permalink

    Setter for a MinHashLSH seed value for the model.

    Setter for a MinHashLSH seed value for the model.

    value

    Long: a seed value

    returns

    this

    Definition Classes
    KSamplingBase
  52. def setLabelCol(value: String): KSampling.this.type

    Permalink

    Setter for the Label Column name of the input DataFrame

    Setter for the Label Column name of the input DataFrame

    value

    String: name of the label column

    returns

    this

    Definition Classes
    KSamplingBase
  53. def setMinimumVectorCountToMutate(value: Int): KSampling.this.type

    Permalink

    Setter for minimum threshold for vector indexes to mutate within the feature vector.

    Setter for minimum threshold for vector indexes to mutate within the feature vector.

    value

    The minimum (or fixed) number of indexes to mutate.

    returns

    this

    Definition Classes
    KSamplingBase
    Note

    In vectorMutationMethod "fixed" this sets the fixed count of how many vector positions to mutate. In vectorMutationMethod "random" this sets the lower threshold for 'at least this many indexes will be mutated'

  54. def setMutationMode(value: String): KSampling.this.type

    Permalink

    Setter for the Mutation Mode of the feature vector individual values

    Setter for the Mutation Mode of the feature vector individual values

    value

    String: the mode to use.

    returns

    this

    Definition Classes
    KSamplingBase
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if the mode is not supported.

    Note

    Options: "weighted" - uses weighted averaging to scale the euclidean distance between the centroid vector and mutation candidate vectors "random" - randomly selects a position on the euclidean vector between the centroid vector and the candidate mutation vectors "ratio" - uses a ratio between the values of the centroid vector and the mutation vector *

  55. def setMutationValue(value: Double): KSampling.this.type

    Permalink

    Setter for specifying the mutation magnitude for the modes 'weighted' and 'ratio' in mutationMode

    Setter for specifying the mutation magnitude for the modes 'weighted' and 'ratio' in mutationMode

    value

    Double: value between 0 and 1 for mutation magnitude adjustment.

    returns

    this

    Definition Classes
    KSamplingBase
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if the value specified is outside of the range (0, 1)

    Note

    the higher this value, the closer to the centroid vector vs. the candidate mutation vector the synthetic row data will be.

  56. def setQuorumCount(value: Int): KSampling.this.type

    Permalink

    Setter for how many vectors to find in adjacency to the centroid for generation of synthetic data

    Setter for how many vectors to find in adjacency to the centroid for generation of synthetic data

    value

    Int: Number of vectors to find nearest each centroid within the class

    returns

    this

    Definition Classes
    KSamplingBase
    Note

    the higher the value set here, the higher the variance in synthetic data generation

  57. def setSyntheticCol(value: String): KSampling.this.type

    Permalink

    Setter for the name to be used for the synthetic column flag that is attached to the output dataframe as an indication that the data present is generated and not original.

    Setter for the name to be used for the synthetic column flag that is attached to the output dataframe as an indication that the data present is generated and not original.

    value

    String: name to be used throughout the job to delineate the fact that the data in the row is generated.

    returns

    this

    Definition Classes
    KSamplingBase
  58. def setVectorMutationMethod(value: String): KSampling.this.type

    Permalink

    Setter for the Vector Mutation Method

    Setter for the Vector Mutation Method

    value

    String - the mode to use.

    returns

    this

    Definition Classes
    KSamplingBase
    Annotations
    @throws( classOf[IllegalArgumentException] )
    Exceptions thrown

    IllegalArgumentException() if the mode is not supported.

    Note

    Options: "fixed" - will use the value of minimumVectorCountToMutate to select random indexes of this number of indexes. "random" - will use this number as a lower bound on a random selection of indexes between this and the vector length. "all" - will mutate all of the vectors.

  59. lazy val spark: SparkSession

    Permalink
    Definition Classes
    SparkSessionWrapper
  60. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  61. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  62. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  63. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  64. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from KSamplingBase

Inherited from SparkSessionWrapper

Inherited from Serializable

Inherited from Serializable

Inherited from KSamplingDefaults

Inherited from AnyRef

Inherited from Any

Ungrouped