Object/Class

org.apache.spark.ml.odkl

UnwrappedStage

Related Docs: class UnwrappedStage | package odkl

Permalink

object UnwrappedStage extends Serializable

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. UnwrappedStage
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class CachingTransformer[M <: ModelWithSummary[M]] extends Model[CachingTransformer[M]] with ModelTransformer[M, CachingTransformer[M]]

    Permalink

    Utility used to inject caching.

  2. class CollectSummaryToParquetTransformer[M <: ModelWithSummary[M]] extends ModelOnlyTransformer[M, CollectSummaryToParquetTransformer[M]]

    Permalink

    Collects all summary blocks and materializes them as into a single partition.

    Collects all summary blocks and materializes them as into a single partition. Then saves it to parquet in order not to waste memory.

  3. class CollectSummaryTransformer[M <: ModelWithSummary[M]] extends ModelOnlyTransformer[M, CollectSummaryTransformer[M]]

    Permalink

    Collects all summary blocks and materializes them as into a single partition.

  4. class DynamicDataTransformerTrainer[M <: ModelWithSummary[M]] extends Estimator[IdentityModelTransformer[M]] with DefaultParamsWritable with PartitioningParams

    Permalink
  5. class DynamicDownsamplerTrainer extends Estimator[SamplingTransformer] with SamplerParams

    Permalink

    For training a model on data set of uncertain size ads an ability to downsample it to a pre-defined size (approximatelly).

  6. class DynamicPartitionerTrainer[M <: ModelWithSummary[M]] extends Estimator[IdentityModelTransformer[M]] with DefaultParamsWritable with PartitioningParams

    Permalink

    In case if number of partitions is not known upfront, you can use dynamic partitioner to split into partitions of predefined size (approximatelly).

  7. class IdentityDataTransformer extends Transformer

    Permalink

    Data transformer which does nothing :)

  8. class IdentityModelTransformer[M <: ModelWithSummary[M]] extends PredefinedDataTransformer[M, IdentityModelTransformer[M]]

    Permalink

    Model transformer applying transformation only to data, keeping the model unchanged.

  9. abstract class ModelOnlyTransformer[M <: ModelWithSummary[M], T <: ModelTransformer[M, T]] extends Model[T] with ModelTransformer[M, T] with DefaultParamsWritable

    Permalink

    Utility simplifying transformations when only model transformation is required.

  10. class NoTrainEstimator[M <: ModelWithSummary[M], T <: ModelTransformer[M, T]] extends Estimator[T] with DefaultParamsWritable

    Permalink

    Utility simplifying creation of predefined model transformer (when no fitting required).

  11. class OrderedCut extends Model[OrderedCut] with HasGroupByColumns

    Permalink

    Keeps data based one the some ordered constraint.

  12. class OrderedCutEstimator extends Estimator[OrderedCut] with HasGroupByColumns

    Permalink

    For training a model on data set of uncertain size ads an ability to take only the "most recent" records.

    For training a model on data set of uncertain size ads an ability to take only the "most recent" records. Estimates the size of the dataset and calculates approximate bounds for filtering.

  13. class PartitioningTransformer extends Transformer with PartitioningParams

    Permalink

    Data transformer which adds partitioning.

  14. class PersistingTransformer[M <: ModelWithSummary[M]] extends Model[PersistingTransformer[M]] with ModelTransformer[M, PersistingTransformer[M]]

    Permalink

    Utility used to persist portion of data into temporary storage.

    Utility used to persist portion of data into temporary storage. Usefull for grounding execution plans and avoid massive "skips". Unlike chekpointing is more explicit and controllable.

  15. abstract class PredefinedDataTransformer[M <: ModelWithSummary[M], T <: ModelTransformer[M, T]] extends Model[T] with ModelTransformer[M, T] with DefaultParamsWritable

    Permalink

    Utility simplifying transformations when data transformation is provided externally.

  16. class ProjectingTransformer extends Transformer

    Permalink

    Data transformer for projecting.

  17. trait SamplerParams extends HasSeed with DefaultParamsWritable

    Permalink

    Parameters for sampling

  18. class SamplingTransformer extends Model[SamplingTransformer] with SamplerParams

    Permalink

    Data transformer which takes sample of the data.

    Data transformer which takes sample of the data. Resulting dataframe is constructed in a way that results are non-determenistic and might vary from run to run (unless the seed is specified or with replacement enabled - in these cases we fallback to default data set sampling which is determenistic).

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def cache[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], cacher: CachingTransformer[M]): UnwrappedStage[M, CachingTransformer[M]]

    Permalink

    Cache data before passing to estimator (won't be cached in resulting prediction model).

  6. def cache[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], storageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): UnwrappedStage[M, CachingTransformer[M]]

    Permalink

    Cache data before passing to estimator (won't be cached in resulting prediction model).

  7. def cacheAndMaterialize[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], storageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): UnwrappedStage[M, CachingTransformer[M]]

    Permalink

    Cache data before passing to estimator (won't be cached in resulting prediction model).

    Cache data before passing to estimator (won't be cached in resulting prediction model). Forces cache materialization by calling count.

  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def collectSummary[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M]): UnwrappedStage[M, CollectSummaryTransformer[M]]

    Permalink

    Collect all summary blocks to driver and add re-create dataframe with a single block.

    Collect all summary blocks to driver and add re-create dataframe with a single block. Usefull to reduce number of partitions and tasks for the final persist.

    estimator

    Estimator to wrap summary blocks for.

    returns

    Final model is the same, but summary blocks are collected and re-created.

  10. def collectSummaryToParquet[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], path: String): UnwrappedStage[M, CollectSummaryToParquetTransformer[M]]

    Permalink

    Saves summary blocks to parquet files add re-create dataframe.

    Saves summary blocks to parquet files add re-create dataframe. Usefull to reduce memory footprint for tasks with large summary (eg. cross-validation output).

    estimator

    Estimator to wrap summary blocks for.

    path

    Where to save parquet files

    returns

    Final model is the same, but summary blocks are written as one partition parquet files and re-created.

  11. def dataOnly[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], dataTransformer: Transformer): UnwrappedStage[M, IdentityModelTransformer[M]]

    Permalink

    Adds a stage with data-only transformation (eg.

    Adds a stage with data-only transformation (eg. assigning folds).

  12. def dataOnlyWithTraining[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], dataTransformerFitter: Estimator[_]): UnwrappedStage[M, IdentityModelTransformer[M]]

    Permalink

    Adds a stage with data-only transformation (eg.

    Adds a stage with data-only transformation (eg. assigning folds).

  13. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  18. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  19. def modelOnly[M <: ModelWithSummary[M], T <: ModelTransformer[M, T]](estimator: SummarizableEstimator[M], modelTransformer: T): UnwrappedStage[M, T]

    Permalink

    Adds a stage with model only transformation (eg.

    Adds a stage with model only transformation (eg. evaluation)

  20. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  21. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  22. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  23. def persistToTemp[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], tempPath: String, uncacheInput: Boolean = false, partitionBy: Array[String] = Array()): UnwrappedStage[M, PersistingTransformer[M]]

    Permalink

    Stores data into temporary path.

    Stores data into temporary path. Usefull for "grounding" data and avoiding large execution plans.

  24. def project[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], columns: Seq[String]): UnwrappedStage[M, IdentityModelTransformer[M]]

    Permalink

    Keeps only predefined set of columns in the dataset before passing to estimator.

    Keeps only predefined set of columns in the dataset before passing to estimator. Usefull in combination with caching to reduce memory footprint. Projection will not appear in the resulting prediction model.

    estimator

    Estimator to cal after projecting.

    columns

    Columns to keep.

    returns

    Exactly the same model as produced by the estimator.

  25. def projectInverse[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], columns: Seq[String]): UnwrappedStage[M, IdentityModelTransformer[M]]

    Permalink

    Removes predefined set of columns in the dataset before passing to estimator.

    Removes predefined set of columns in the dataset before passing to estimator. Usefull in combination with caching to reduce memory footprint. Projection will not appear in the resulting prediction model.

    estimator

    Estimator to cal after projecting.

    columns

    Columns to remove.

    returns

    Exactly the same model as produced by the estimator.

  26. def repartition[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], numPartitions: Int, partitionBy: Seq[String]): UnwrappedStage[M, IdentityModelTransformer[M]]

    Permalink

    Repartition the data before passing to estimator.

    Repartition the data before passing to estimator. Reparitioning will not apear in the resulting prediction model.

    estimator

    Estimator to add partitioning to.

    numPartitions

    Number of partitions.

    partitionBy

    Columns to partition by.

    returns

    Exactly the same model as produced by the estimator.

  27. def repartition[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], numPartitions: Int): UnwrappedStage[M, IdentityModelTransformer[M]]

    Permalink

    Repartition the data before passing to estimator.

    Repartition the data before passing to estimator. Reparitioning will not apear in the resulting prediction model.

    estimator

    Estimator to add partitioning to.

    numPartitions

    Number of partitions.

    returns

    Exactly the same model as produced by the estimator.

  28. def repartition[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], partitioner: PartitioningTransformer): UnwrappedStage[M, IdentityModelTransformer[M]]

    Permalink

    Repartition the data before passing to estimator.

    Repartition the data before passing to estimator. Reparitioning will not apear in the resulting prediction model.

    partitioner

    Defines the logic of partitioning.

  29. def repartition[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], numPartitions: Int, partitionBy: Seq[String], sortBy: Seq[String]): UnwrappedStage[M, IdentityModelTransformer[M]]

    Permalink

    Repartition the data before passing to estimator.

    Repartition the data before passing to estimator. Reparitioning will not apear in the resulting prediction model.

    estimator

    Estimator to add partitioning to.

    numPartitions

    Number of partitions.

    partitionBy

    Columns to partition by.

    sortBy

    Columns to sort data in partitions. Note that partitionBy are not added to this set by default.

    returns

    Exactly the same model as produced by the estimator.

  30. def sample[M <: ModelWithSummary[M]](estimator: SummarizableEstimator[M], numRecords: Int, withReplacement: Boolean = false, seed: Option[Long] = None): UnwrappedStage[M, IdentityModelTransformer[M]]

    Permalink

    Adds a stage for sampling data from the dataset.

    Adds a stage for sampling data from the dataset. Behavior is deterministic (iteration always produce the same result) if withReplacement OR seed specified, otherwise the behavior is non-determenistic and subsequent iterations migth see different samples.

    estimator

    Estimator to sample data for.

    numRecords

    Expected number of records to sample

    withReplacement

    Whenever to simulate replacement (single item might be selected multiple times)

    seed

    Seed for the random number generation.

    returns

    Estimator with samples data before passing to nested estimator.

  31. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  32. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  33. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  34. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. def wrap[M <: ModelWithSummary[M], T <: ModelTransformer[M, T]](estimator: SummarizableEstimator[M], unwrapableEstimator: Estimator[T]): UnwrappedStage[M, T]

    Permalink

    Adds a stage with data downstream transformation and model upstream transformation.

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped