Class/Object

org.apache.spark.ml.automl.feature

BinaryEncoder

Related Docs: object BinaryEncoder | package feature

Permalink

class BinaryEncoder extends Estimator[BinaryEncoderModel] with DefaultParamsWritable with HasInputCols with HasOutputCols with BinaryEncoderBase

Linear Supertypes
BinaryEncoderBase, HasHandleInvalid, HasOutputCols, HasInputCols, DefaultParamsWritable, MLWritable, Estimator[BinaryEncoderModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. BinaryEncoder
  2. BinaryEncoderBase
  3. HasHandleInvalid
  4. HasOutputCols
  5. HasInputCols
  6. DefaultParamsWritable
  7. MLWritable
  8. Estimator
  9. PipelineStage
  10. Logging
  11. Params
  12. Serializable
  13. Serializable
  14. Identifiable
  15. AnyRef
  16. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new BinaryEncoder()

    Permalink
  2. new BinaryEncoder(uid: String)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. final def clear(param: Param[_]): BinaryEncoder.this.type

    Permalink
    Definition Classes
    Params
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def copy(extra: ParamMap): BinaryEncoder

    Permalink
    Definition Classes
    BinaryEncoder → Estimator → PipelineStage → Params
  9. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  10. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  13. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  14. def explainParams(): String

    Permalink
    Definition Classes
    Params
  15. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  16. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  17. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. def fit(dataset: Dataset[_]): BinaryEncoderModel

    Permalink

    Main fit method that will build a BinaryEncoder model from the data set and the configured input and output columns specified in the setters.

    Main fit method that will build a BinaryEncoder model from the data set and the configured input and output columns specified in the setters. The primary principle at work here is dimensionality reduction for the encoding of extremely high-cardinality StringIndexed columns. OneHotEncoding works extremely well for this purpose, but has the side-effect of requiring extremely large amounts of columns to be generated when performing OHE is increased memory pressure. This package allows for a lossy reduction in this space by distilling the information into a binary string encoding space that is dynamic based on the encoded length of the maximum nominal space as represented in binary

    dataset

    The dataset (or DataFrame) used in training the model

    returns

    BinaryEncoderModel - a serializable artifact that has the output schema and encoding embedded within it.

    Definition Classes
    BinaryEncoder → Estimator
    Example:
    1. e.g. if the cardinality of a nominal column is 113, the binary representation of that is 1110001. When using OHE, this would result in 113 (or 114 if allowing invalids) binary positions within a sparse vector, creating 113 or 114 columns in the dataset. However, using BinaryEncoder, we are left with 7 (or 8, if allowing invalids) dense vector positions to capture the same amount of information.

    Since

    0.5.3

    Note

    Due to the nature of this encoding and how the majority of models learn, this is seen as an information loss encoding. However, considering that high cardinality non-numeric nominal fields are frequently discarded due to the explosion of the data set, this is providing the ability to utilize high cardinality fields that otherwise would not be able to be included.

  19. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[BinaryEncoderModel]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  20. def fit(dataset: Dataset[_], paramMap: ParamMap): BinaryEncoderModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  21. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): BinaryEncoderModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  22. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  23. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  24. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  25. final def getHandleInvalid: String

    Permalink
    Definition Classes
    HasHandleInvalid
  26. final def getInputCols: Array[String]

    Permalink
    Definition Classes
    HasInputCols
  27. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  28. final def getOutputCols: Array[String]

    Permalink
    Definition Classes
    HasOutputCols
  29. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  30. val handleInvalid: Param[String]

    Permalink

    Configuration of the Parameter for handling invalid entries in a previously modeled feature column.

    Configuration of the Parameter for handling invalid entries in a previously modeled feature column.

    Definition Classes
    BinaryEncoderBase → HasHandleInvalid
  31. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  32. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  33. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  34. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  35. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  36. final val inputCols: StringArrayParam

    Permalink
    Definition Classes
    HasInputCols
  37. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  38. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  39. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  40. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  41. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  42. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  43. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  44. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  45. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  46. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  47. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  48. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  49. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  50. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  51. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  52. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  53. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  54. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  55. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  56. final val outputCols: StringArrayParam

    Permalink
    Definition Classes
    HasOutputCols
  57. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  58. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  59. final def set(paramPair: ParamPair[_]): BinaryEncoder.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  60. final def set(param: String, value: Any): BinaryEncoder.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  61. final def set[T](param: Param[T], value: T): BinaryEncoder.this.type

    Permalink
    Definition Classes
    Params
  62. final def setDefault(paramPairs: ParamPair[_]*): BinaryEncoder.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  63. final def setDefault[T](param: Param[T], value: T): BinaryEncoder.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  64. def setHandleInvalid(value: String): BinaryEncoder.this.type

    Permalink

    Setter for supplying an optional 'keep' or 'error' (Default: 'error') for un-seen values that arrive into a pre-trained model.

    Setter for supplying an optional 'keep' or 'error' (Default: 'error') for un-seen values that arrive into a pre-trained model. With the 'keep' setting, an additional vector position is added to the output column to ensure no collisions may exist with real data and the values throughout each of the Array[Double] locations in the DenseVector output will all be set to '1'

    value

    String: either 'keep' or 'error' (Default: 'error')

    Annotations
    @throws( classOf[SparkException] )
    Since

    0.5.3

    Exceptions thrown

    SparkException if the configuration value supplied is not either 'keep' or 'error'

  65. def setInputCols(values: Array[String]): BinaryEncoder.this.type

    Permalink

    Setter for supplying the array of input columns to be encoded with the BinaryEncoder type

    Setter for supplying the array of input columns to be encoded with the BinaryEncoder type

    values

    Array of column names

    Since

    0.5.3

  66. def setOutputCols(values: Array[String]): BinaryEncoder.this.type

    Permalink

    Setter for supplying the array of output columns that are the result of running a .transform from a trained model on an appropriate dataset of compatible schema

    Setter for supplying the array of output columns that are the result of running a .transform from a trained model on an appropriate dataset of compatible schema

    values

    Array of column names that will be generated through a .transform

    Since

    0.5.3

  67. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  68. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  69. def transformSchema(schema: StructType): StructType

    Permalink
    Definition Classes
    BinaryEncoder → PipelineStage
  70. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  71. val uid: String

    Permalink
    Definition Classes
    BinaryEncoder → Identifiable
  72. def validateAndTransformSchema(schema: StructType, keepInvalid: Boolean): StructType

    Permalink

    Method for validating the resultant schema from the application of building and transforming using this encoder package.

    Method for validating the resultant schema from the application of building and transforming using this encoder package. The purpose of validation is to ensure that the supplied input columns are of the correct binary or nominal (ordinal numeric) type and that the output columns will contain the correct number of columns based on the configuration set.

    schema

    The schema of the dataset supplied for training of the model or used in transforming using the model

    keepInvalid

    Boolean flag for whether to allow for an additional binary encoding value to be used for any values that were unknown at the time of model training, which will summarily be converted to a 'max binary value' of the encoding length + 1 with maximum n * "1" values.

    returns

    StructType that represents the transformed schema with additional output columns appended to the dataset structure.

    Attributes
    protected
    Definition Classes
    BinaryEncoderBase
    Annotations
    @throws( ... )
    Since

    0.5.3

    Exceptions thrown

    UnsupportedOperationException if the configured input cols and output cols do not match one another in length.

  73. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  74. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  75. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  76. def write: MLWriter

    Permalink
    Definition Classes
    DefaultParamsWritable → MLWritable

Inherited from BinaryEncoderBase

Inherited from HasHandleInvalid

Inherited from HasOutputCols

Inherited from HasInputCols

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Estimator[BinaryEncoderModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Ungrouped