org.apache.spark.ml.odkl

DSVRGD

abstract class DSVRGD[M <: ModelWithSummary[M]] extends Estimator[M] with SummarizableEstimator[M] with HasPredictionCol with HasFeaturesCol with HasLabelCol with HasRegParam with HasElasticNetParam with HasNetlibBlas with HasMaxIter with HasTol with HasCacheTrainData

Created by dmitriybugaichenko on 10.11.16.

Implementation of a distributed version of Stochastic Variance Reduced Gradient Descent. The idea is taken from https://arxiv.org/abs/1512.01708 - input dataset is partitioned and workers performs descent simultaneously updating own copy of the weights at each random point (following SGD schema). At the end of epoche data from all workers are collected and aggregated. Variance reduction is achieved by keeping average gradient from previous iterations and evaluating gradient at one extra point (average of all weights seen during previous epoche). The update rule is:

w_new = w_old − η (∇f_i(w_old) − ∇f_i(w_avg) + g)

TODO: Other variance reduction and step size tuning techniques might be applied.

Requires AttributeGroup metadata for both labels and features, supports elastic net regularization and multiple parallel labels training (similar to MatrixLBFGS).

Linear Supertypes
HasCacheTrainData, HasTol, HasMaxIter, HasNetlibBlas, HasElasticNetParam, HasRegParam, HasLabelCol, HasFeaturesCol, HasPredictionCol, SummarizableEstimator[M], Estimator[M], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DSVRGD
  2. HasCacheTrainData
  3. HasTol
  4. HasMaxIter
  5. HasNetlibBlas
  6. HasElasticNetParam
  7. HasRegParam
  8. HasLabelCol
  9. HasFeaturesCol
  10. HasPredictionCol
  11. SummarizableEstimator
  12. Estimator
  13. PipelineStage
  14. Logging
  15. Params
  16. Serializable
  17. Serializable
  18. Identifiable
  19. AnyRef
  20. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DSVRGD(uid: String)

Abstract Value Members

  1. abstract def addGradient(weights: Matrix, features: DenseMatrix, labels: DenseMatrix, updateTerm: DenseMatrix, marginCache: DenseMatrix, lossCache: DenseVector): Unit

    For single instance and weights calculates gradient and loss.

    For single instance and weights calculates gradient and loss. Depending on direction adds gradient and loss to the accumulated data.

    weights

    Weights to evaluate gradient at

    features

    Featrues of instance to evaluate gradient at

    labels

    Labels of the instance to evaluate gradient at

    updateTerm

    Update term to store gradient at

    lossCache

    Loss vector to record resulting loss values.

    Attributes
    protected
  2. abstract def extractModel(labelAttributeGroup: AttributeGroup, numLabels: Int, weights: Matrix, dataset: DataFrame): M

    Given labels info and weights matrice create appropriate ML models.

    Given labels info and weights matrice create appropriate ML models.

    Attributes
    protected
  3. abstract def weightsDistanceForLabel(oldWeights: Matrix, newWeights: DenseMatrix, label: Int): Double

    Evaluates weight distance based on old and new weights images.

    Evaluates weight distance based on old and new weights images.

    oldWeights

    Weights from the previous epoch

    newWeights

    Weights from the current epoch.

    label

    Label to check for convergence.

    returns

    Distance between old and new weights.

    Attributes
    protected

Concrete Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def $[T](param: Param[T]): T

    Attributes
    protected
    Definition Classes
    Params
  5. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  6. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  7. def addL1Reg(l1regParam: Vector, weights: DenseMatrix, updateTerm: DenseMatrix, lossCache: DenseVector, skipRegFeature: Int): DenseMatrix

    Attributes
    protected
  8. def addL2Reg(l2regParam: Vector, weights: DenseMatrix, updateTerm: DenseMatrix, lossCache: DenseVector, skipRegFeature: Int): DenseMatrix

    Adds L2 regularization part to the gradient and loss.

    Adds L2 regularization part to the gradient and loss.

    Attributes
    protected
  9. def adjust(direction: Int, learningRates: DenseMatrix, updateTerm: DenseMatrix, weights: DenseMatrix): DenseMatrix

  10. def applyL1Shrinkage(regParam: Vector, weights: DenseMatrix, skipRegFeature: Int, notDegraded: Set[Int]): DenseMatrix

    Apply L1 shrinkage to the updated weights.

    Apply L1 shrinkage to the updated weights.

    Attributes
    protected
  11. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  12. def axpy(a: Double, x: Vector, y: Array[Double]): Unit

    Definition Classes
    HasNetlibBlas
  13. def axpy(a: Double, x: Array[Double], y: Array[Double]): Unit

    Definition Classes
    HasNetlibBlas
  14. def axpyCompensated(updateTerm: Array[Double], sum: Array[Double], compensator: Array[Double], y: Array[Double], t: Array[Double]): Unit

  15. def blas: BLAS

    Definition Classes
    HasNetlibBlas
  16. final val cacheTrainData: BooleanParam

    Definition Classes
    HasCacheTrainData
  17. final def clear(param: Param[_]): DSVRGD.this.type

    Definition Classes
    Params
  18. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. val convergenceMode: Param[String]

  20. def copy(extra: ParamMap): DSVRGD[M]

    Definition Classes
    DSVRGDSummarizableEstimator → Estimator → PipelineStage → Params
  21. def copy(x: Array[Double], y: Array[Double]): Unit

    Definition Classes
    HasNetlibBlas
  22. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Attributes
    protected
    Definition Classes
    Params
  23. final def defaultCopy[T <: Params](extra: ParamMap): T

    Attributes
    protected
    Definition Classes
    Params
  24. def dscal(a: Double, data: Array[Double]): Unit

    Definition Classes
    HasNetlibBlas
  25. final val elasticNetParam: DoubleParam

    Definition Classes
    HasElasticNetParam
  26. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  27. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  28. def evaluateL1Regularization(data: DataFrame, l1Scalar: Double, numLabels: Int): Vector

    Given L1 regularization config create a vector with per-label reg param (by default - constant).

    Given L1 regularization config create a vector with per-label reg param (by default - constant).

    Attributes
    protected
  29. def evaluateL2Regularization(data: DataFrame, l2Scalar: Double, numLabels: Int): Vector

    Given L2 regularization config create a vector with per-label reg param (by default - constant).

    Given L2 regularization config create a vector with per-label reg param (by default - constant).

    Attributes
    protected
  30. def explainParam(param: Param[_]): String

    Definition Classes
    Params
  31. def explainParams(): String

    Definition Classes
    Params
  32. def extractBlock(lossHistory: Array[CompactBuffer[Double]], dataset: DataFrame, names: Map[Int, String], sc: SparkContext): DataFrame

  33. def extractLabelVectors(labelAttributeGroup: AttributeGroup, numLabels: Int, weights: Matrix): Map[String, Vector]

    Utility used to split weights matrice into label -> vector map

    Utility used to split weights matrice into label -> vector map

    Attributes
    protected
  34. final def extractParamMap(): ParamMap

    Definition Classes
    Params
  35. final def extractParamMap(extra: ParamMap): ParamMap

    Definition Classes
    Params
  36. def extractRow(label: Int, weights: Matrix): Vector

    Extracts a single row from a matrice.

    Extracts a single row from a matrice.

    Attributes
    protected
  37. def extractSummaryBlocks(lossHistory: Array[CompactBuffer[Double]], weightDiffHistory: Array[CompactBuffer[Double]], weightNormHistory: Array[CompactBuffer[Double]], dataset: DataFrame, labelAttributeGroup: AttributeGroup): Map[Block, DataFrame]

    Extracts summary blocks from iterations loss history.

    Extracts summary blocks from iterations loss history.

    Attributes
    protected
  38. def f2jBLAS: BLAS

    Definition Classes
    HasNetlibBlas
  39. final val featuresCol: Param[String]

    Definition Classes
    HasFeaturesCol
  40. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  41. def fit(dataset: Dataset[_]): M

    Definition Classes
    DSVRGD → Estimator
  42. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[M]

    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  43. def fit(dataset: Dataset[_], paramMap: ParamMap): M

    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  44. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): M

    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  45. def fullGradientAndLoss(l1regParam: Vector, l2regParam: Vector, localWeights: DenseMatrix, marginCache: DenseMatrix, lossCache: DenseVector, updateTerm: DenseMatrix, skipRegFeature: Int, features: DenseMatrix, labels: DenseMatrix): Any

  46. final def get[T](param: Param[T]): Option[T]

    Definition Classes
    Params
  47. final def getCacheTrainData: Boolean

    Definition Classes
    HasCacheTrainData
  48. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  49. final def getDefault[T](param: Param[T]): Option[T]

    Definition Classes
    Params
  50. final def getElasticNetParam: Double

    Definition Classes
    HasElasticNetParam
  51. final def getFeaturesCol: String

    Definition Classes
    HasFeaturesCol
  52. final def getLabelCol: String

    Definition Classes
    HasLabelCol
  53. final def getMaxIter: Int

    Definition Classes
    HasMaxIter
  54. def getNotConverged(activeLabels: Map[Int, Int], lossHistory: Array[CompactBuffer[Double]], weightDiffHistory: Array[CompactBuffer[Double]], weightNormHistory: Array[CompactBuffer[Double]], tolerance: Double): Array[Int]

    Extracts not converged labels based on actual and previous weights and on the loss history.

    Extracts not converged labels based on actual and previous weights and on the loss history.

    Attributes
    protected
  55. final def getOrDefault[T](param: Param[T]): T

    Definition Classes
    Params
  56. def getParam(paramName: String): Param[Any]

    Definition Classes
    Params
  57. final def getPredictionCol: String

    Definition Classes
    HasPredictionCol
  58. final def getRegParam: Double

    Definition Classes
    HasRegParam
  59. final def getTol: Double

    Definition Classes
    HasTol
  60. final def hasDefault[T](param: Param[T]): Boolean

    Definition Classes
    Params
  61. def hasParam(paramName: String): Boolean

    Definition Classes
    Params
  62. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  63. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Attributes
    protected
    Definition Classes
    Logging
  64. def initializeWeights(data: DataFrame, numLabels: Int, numFeatures: Int): Matrix

  65. final def isDefined(param: Param[_]): Boolean

    Definition Classes
    Params
  66. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  67. final def isSet(param: Param[_]): Boolean

    Definition Classes
    Params
  68. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  69. final val labelCol: Param[String]

    Definition Classes
    HasLabelCol
  70. val lastIsIntercept: BooleanParam

  71. val learningRate: DoubleParam

  72. val localMinibatchSize: Param[Int]

  73. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  74. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  75. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  76. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  77. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  78. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  79. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  80. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  81. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  82. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  83. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  84. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  85. def lossDifferenceForLabel(lossHistory: Array[CompactBuffer[Double]], label: Int): Double

    Evaluates loss difference simply as relative change

  86. val lossIncreaseTolerance: DoubleParam

  87. final val maxIter: IntParam

    Definition Classes
    HasMaxIter
  88. def merge(labelsMap: Map[Int, Int], weights: Matrix, newWeights: DenseMatrix): DenseMatrix

    Merges weights from the new epoch with overal weights.

    Merges weights from the new epoch with overal weights. Dimensions of weights matrices might be different when part of labels are already converged and do not participate in descend.

    Attributes
    protected
  89. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  90. final def notify(): Unit

    Definition Classes
    AnyRef
  91. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  92. lazy val params: Array[Param[_]]

    Definition Classes
    Params
  93. final val predictionCol: Param[String]

    Definition Classes
    HasPredictionCol
  94. final val regParam: DoubleParam

    Definition Classes
    HasRegParam
  95. def relabel(activeLabels: Array[Int], labels: Vector): DenseVector

    Used to preserve only active (not yet converged) labels into a vector

    Used to preserve only active (not yet converged) labels into a vector

    Attributes
    protected
  96. def relabelMatrix(activeLabels: Array[Int], matrix: Matrix): Matrix

    Used to preserve only active (not yet converged) labels into a matrix

    Used to preserve only active (not yet converged) labels into a matrix

    Attributes
    protected
  97. final def set(paramPair: ParamPair[_]): DSVRGD.this.type

    Attributes
    protected
    Definition Classes
    Params
  98. final def set(param: String, value: Any): DSVRGD.this.type

    Attributes
    protected
    Definition Classes
    Params
  99. final def set[T](param: Param[T], value: T): DSVRGD.this.type

    Definition Classes
    Params
  100. def setCacheTrainData(value: Boolean): DSVRGD.this.type

    Definition Classes
    HasCacheTrainData
  101. def setConvergenceMode(value: String): DSVRGD.this.type

  102. final def setDefault(paramPairs: ParamPair[_]*): DSVRGD.this.type

    Attributes
    protected
    Definition Classes
    Params
  103. final def setDefault[T](param: Param[T], value: T): DSVRGD.this.type

    Attributes
    protected
    Definition Classes
    Params
  104. def setElasticNetParam(value: Double): DSVRGD.this.type

  105. def setLastIsIntercept(value: Boolean): DSVRGD.this.type

  106. def setLearningRate(value: Double): DSVRGD.this.type

  107. def setLocalMinibatchSize(value: Int): DSVRGD.this.type

  108. def setMaxIter(value: Int): DSVRGD.this.type

  109. def setRegParam(value: Double): DSVRGD.this.type

  110. def setSlowDownFactor(value: Double): DSVRGD.this.type

  111. def setSpeedUpFactor(value: Double): DSVRGD.this.type

  112. def setTol(value: Double): DSVRGD.this.type

  113. def singleStep(data: RDD[(Vector, DenseVector)], weights: Broadcast[Matrix], avgWeights: Broadcast[Matrix], avgGradient: Broadcast[Matrix], l1regParam: Vector, l2regParam: Vector, stepNum: Int, labelLearningRates: DenseVector): DistributedSgdState

    Single epoch of the descend

    Single epoch of the descend

    data

    Data with features and labels

    weights

    Weghts matrix to start with.

    avgWeights

    Average weights among walked during previous epoch.

    avgGradient

    Average gradient among seen during previous epoch.

    l1regParam

    Vector with the strength of L1 regularization (null if disabled)

    l2regParam

    Vector with the strength of L2 regularization (null if disabled)

    stepNum

    Number of epoch

    returns

    State with weights, averages and loss from this epoch

    Attributes
    protected
  114. val slowDownFactor: DoubleParam

  115. val speedUpFactor: DoubleParam

  116. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  117. def toDense(weights: Broadcast[Matrix]): DenseMatrix

  118. def toString(): String

    Definition Classes
    Identifiable → AnyRef → Any
  119. final val tol: DoubleParam

    Definition Classes
    HasTol
  120. def transformSchema(schema: StructType): StructType

    Definition Classes
    DSVRGD → PipelineStage
    Annotations
    @DeveloperApi()
  121. def transformSchema(schema: StructType, logging: Boolean): StructType

    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  122. val uid: String

    Definition Classes
    DSVRGD → Identifiable
  123. def updateWeights(stepSize: Double, updateTerm: DenseMatrix, weights: DenseMatrix): Unit

    Updates the weights given update term and current value.

    Updates the weights given update term and current value.

    Attributes
    protected
  124. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  125. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  126. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  127. def weightNorm(newWeights: Matrix, label: Int, skipRegFeature: Int): Double

    Evaluates weight norm for a given label.

    Evaluates weight norm for a given label.

    newWeights

    Weights matrix

    label

    Label to evaluate weights

    returns

    Weights norm.

    Attributes
    protected

Inherited from HasCacheTrainData

Inherited from HasTol

Inherited from HasMaxIter

Inherited from HasNetlibBlas

Inherited from HasElasticNetParam

Inherited from HasRegParam

Inherited from HasLabelCol

Inherited from HasFeaturesCol

Inherited from HasPredictionCol

Inherited from SummarizableEstimator[M]

Inherited from Estimator[M]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

getParam

Ungrouped