Class/Object

org.apache.spark.ml.odkl

QuantileDiscretizer

Related Docs: object QuantileDiscretizer | package odkl

Permalink

final class QuantileDiscretizer extends Estimator[Bucketizer] with QuantileDiscretizerBase with DefaultParamsWritable with HasInputCols with HasOutputCols

QuantileDiscretizer takes a column with continuous features and outputs a column with binned categorical features. The number of bins can be set using the numBuckets parameter. It is possible that the number of buckets used will be smaller than this value, for example, if there are too few distinct values of the input to create enough distinct quantiles. Since 2.3.0, QuantileDiscretizer can map multiple columns at once by setting the inputCols parameter. If both of the inputCol and inputCols parameters are set, an Exception will be thrown. To specify the number of buckets for each column, the numBucketsArray parameter can be set, or if the number of buckets should be the same across columns, numBuckets can be set as a convenience.

NaN handling: null and NaN values will be ignored from the column during QuantileDiscretizer fitting. This will produce a Bucketizer model for making predictions. During the transformation, Bucketizer will raise an error when it finds NaN values in the dataset, but the user can also choose to either keep or remove NaN values within the dataset by setting handleInvalid. If the user chooses to keep NaN values, they will be handled specially and placed into their own bucket, for example, if 4 buckets are used, then non-NaN data will be put into buckets[0-3], but NaNs will be counted in a special bucket[4].

Algorithm: The bin ranges are chosen using an approximate algorithm (see the documentation for org.apache.spark.sql.DataFrameStatFunctions.approxQuantile for a detailed description). The precision of the approximation can be controlled with the relativeError parameter. The lower and upper bin bounds will be -Infinity and +Infinity, covering all real values.

Annotations
@Since( "1.6.0" )
Linear Supertypes
HasOutputCols, HasInputCols, DefaultParamsWritable, MLWritable, QuantileDiscretizerBase, HasOutputCol, HasInputCol, HasHandleInvalid, Estimator[Bucketizer], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. QuantileDiscretizer
  2. HasOutputCols
  3. HasInputCols
  4. DefaultParamsWritable
  5. MLWritable
  6. QuantileDiscretizerBase
  7. HasOutputCol
  8. HasInputCol
  9. HasHandleInvalid
  10. Estimator
  11. PipelineStage
  12. Logging
  13. Params
  14. Serializable
  15. Serializable
  16. Identifiable
  17. AnyRef
  18. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new QuantileDiscretizer()

    Permalink
    Annotations
    @Since( "1.6.0" )
  2. new QuantileDiscretizer(uid: String)

    Permalink
    Annotations
    @Since( "1.6.0" )

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. final def clear(param: Param[_]): QuantileDiscretizer.this.type

    Permalink
    Definition Classes
    Params
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def copy(extra: ParamMap): QuantileDiscretizer

    Permalink
    Definition Classes
    QuantileDiscretizer → Estimator → PipelineStage → Params
    Annotations
    @Since( "1.6.0" )
  9. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  10. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  13. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  14. def explainParams(): String

    Permalink
    Definition Classes
    Params
  15. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  16. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  17. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. def fit(dataset: Dataset[_]): Bucketizer

    Permalink
    Definition Classes
    QuantileDiscretizer → Estimator
    Annotations
    @Since( "2.0.0" )
  19. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[Bucketizer]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  20. def fit(dataset: Dataset[_], paramMap: ParamMap): Bucketizer

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  21. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): Bucketizer

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  22. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  23. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  24. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  25. final def getHandleInvalid: String

    Permalink
    Definition Classes
    HasHandleInvalid
  26. def getInOutCols: (Array[String], Array[String])

    Permalink
  27. final def getInputCol: String

    Permalink
    Definition Classes
    HasInputCol
  28. final def getInputCols: Array[String]

    Permalink
    Definition Classes
    HasInputCols
  29. def getNumBuckets: Int

    Permalink

    Definition Classes
    QuantileDiscretizerBase
  30. def getNumBucketsArray: Array[Int]

    Permalink

    Definition Classes
    QuantileDiscretizerBase
  31. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  32. final def getOutputCol: String

    Permalink
    Definition Classes
    HasOutputCol
  33. final def getOutputCols: Array[String]

    Permalink
    Definition Classes
    HasOutputCols
  34. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  35. def getRelativeError: Double

    Permalink

    Definition Classes
    QuantileDiscretizerBase
  36. val handleInvalid: Param[String]

    Permalink

    Param for how to handle invalid entries.

    Param for how to handle invalid entries. Options are 'skip' (filter out rows with invalid values), 'error' (throw an error), or 'keep' (keep invalid values in a special additional bucket). Note that in the multiple columns case, the invalid handling is applied to all columns. That said for 'error' it will throw an error if any invalids are found in any column, for 'skip' it will skip rows with any invalids in any columns, etc. Default: "error"

    Definition Classes
    QuantileDiscretizerBase → HasHandleInvalid
    Annotations
    @Since( "2.1.0" )
  37. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  38. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  39. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  40. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  41. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  42. final val inputCol: Param[String]

    Permalink
    Definition Classes
    HasInputCol
  43. final val inputCols: StringArrayParam

    Permalink
    Definition Classes
    HasInputCols
  44. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  45. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  46. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  47. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  48. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  49. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  50. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  51. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  52. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  53. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  54. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  55. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  56. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  57. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  58. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  59. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  60. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  61. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  62. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  63. val numBuckets: IntParam

    Permalink

    Number of buckets (quantiles, or categories) into which data points are grouped.

    Number of buckets (quantiles, or categories) into which data points are grouped. Must be greater than or equal to 2.

    See also handleInvalid, which can optionally create an additional bucket for NaN values.

    default: 2

    Definition Classes
    QuantileDiscretizerBase
  64. val numBucketsArray: IntArrayParam

    Permalink

    Array of number of buckets (quantiles, or categories) into which data points are grouped.

    Array of number of buckets (quantiles, or categories) into which data points are grouped. Each value must be greater than or equal to 2

    See also handleInvalid, which can optionally create an additional bucket for NaN values.

    Definition Classes
    QuantileDiscretizerBase
  65. final val outputCol: Param[String]

    Permalink
    Definition Classes
    HasOutputCol
  66. final val outputCols: StringArrayParam

    Permalink
    Definition Classes
    HasOutputCols
  67. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  68. val relativeError: DoubleParam

    Permalink

    Relative error (see documentation for org.apache.spark.sql.DataFrameStatFunctions.approxQuantile for description) Must be in the range [0, 1].

    Relative error (see documentation for org.apache.spark.sql.DataFrameStatFunctions.approxQuantile for description) Must be in the range [0, 1]. Note that in multiple columns case, relative error is applied to all columns. default: 0.001

    Definition Classes
    QuantileDiscretizerBase
  69. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  70. final def set(paramPair: ParamPair[_]): QuantileDiscretizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  71. final def set(param: String, value: Any): QuantileDiscretizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  72. final def set[T](param: Param[T], value: T): QuantileDiscretizer.this.type

    Permalink
    Definition Classes
    Params
  73. final def setDefault(paramPairs: ParamPair[_]*): QuantileDiscretizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  74. final def setDefault[T](param: Param[T], value: T): QuantileDiscretizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  75. def setHandleInvalid(value: String): QuantileDiscretizer.this.type

    Permalink

    Annotations
    @Since( "2.1.0" )
  76. def setInputCol(value: String): QuantileDiscretizer.this.type

    Permalink

    Annotations
    @Since( "1.6.0" )
  77. def setInputCols(value: Array[String]): QuantileDiscretizer.this.type

    Permalink

    Annotations
    @Since( "2.3.0" )
  78. def setNumBuckets(value: Int): QuantileDiscretizer.this.type

    Permalink

    Annotations
    @Since( "1.6.0" )
  79. def setNumBucketsArray(value: Array[Int]): QuantileDiscretizer.this.type

    Permalink

    Annotations
    @Since( "2.3.0" )
  80. def setOutputCol(value: String): QuantileDiscretizer.this.type

    Permalink

    Annotations
    @Since( "1.6.0" )
  81. def setOutputCols(value: Array[String]): QuantileDiscretizer.this.type

    Permalink

    Annotations
    @Since( "2.3.0" )
  82. def setRelativeError(value: Double): QuantileDiscretizer.this.type

    Permalink

    Annotations
    @Since( "2.0.0" )
  83. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  84. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  85. def transformSchema(schema: StructType): StructType

    Permalink
    Definition Classes
    QuantileDiscretizer → PipelineStage
    Annotations
    @Since( "1.6.0" )
  86. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  87. val uid: String

    Permalink
    Definition Classes
    QuantileDiscretizer → Identifiable
    Annotations
    @Since( "1.6.0" )
  88. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  89. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  90. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  91. def write: MLWriter

    Permalink
    Definition Classes
    QuantileDiscretizer → DefaultParamsWritable → MLWritable

Inherited from HasOutputCols

Inherited from HasInputCols

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from QuantileDiscretizerBase

Inherited from HasOutputCol

Inherited from HasInputCol

Inherited from HasHandleInvalid

Inherited from Estimator[Bucketizer]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

getParam

param

setParam

Ungrouped