Package

com.coxautodata.waimak.dataflow

spark

Permalink

package spark

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. spark
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. type CleanUpStrategy[T] = (TableName, InputSnapshots[T]) ⇒ SnapshotsToDelete[T]

    Permalink
  2. class FSCleanUp extends SparkDataFlowAction with Logging

    Permalink

    Action that deletes snapshots based on the cleanup strategy.

    Action that deletes snapshots based on the cleanup strategy. It can cleanup one or more labels.

  3. type InputSnapshots[T] = Seq[T]

    Permalink
  4. case class LabelCommitDefinition(basePath: String, timestampFolder: Option[String] = None, partitions: Seq[String] = Seq.empty, connection: Option[HadoopDBConnector] = None) extends Product with Serializable

    Permalink
  5. case class ParquetDataCommitter(outputBaseFolder: String, snapshotFolder: Option[String] = None, cleanupStrategy: Option[CleanUpStrategy[FileStatus]] = None, hadoopDBConnector: Option[HadoopDBConnector] = None) extends DataCommitter with Logging with Product with Serializable

    Permalink

    Adds actions necessary to commit labels as parquet parquet, supports snapshot folders and interaction with a DB connector.

    Adds actions necessary to commit labels as parquet parquet, supports snapshot folders and interaction with a DB connector.

    Created by Alexei Perelighin on 2018/11/05

    outputBaseFolder

    folder under which final labels will store its data. Ex: baseFolder/label_1/

    snapshotFolder

    optional name of the snapshot folder that will be used by all of the labels committed via this committer. It needs to be a full name and must not be the same as in any of the previous snapshots for any of the commit managed labels. Ex: baseFolder/label_1/snapshot_folder=20181128 baseFolder/label_1/snapshot_folder=20181129 baseFolder/label_2/snapshot_folder=20181128 baseFolder/label_2/snapshot_folder=20181129

    cleanupStrategy

    optional function that takes the list of available snapshots and returns list of snapshots to remove

    hadoopDBConnector

    optional connector to the DB.

  6. class SimpleAction extends SparkDataFlowAction

    Permalink

    Instances of this class build a bridge between OOP part of the Waimak engine and functional definition of the data flow.

    Instances of this class build a bridge between OOP part of the Waimak engine and functional definition of the data flow.

    Created by Alexei Perelighin on 03/11/17.

  7. type SnapshotsToDelete[T] = Seq[T]

    Permalink
  8. class SparkDataFlow extends DataFlow with Logging

    Permalink

    Introduces spark session into the data flows

  9. trait SparkDataFlowAction extends DataFlowAction

    Permalink
  10. case class SparkDataFlowInfo(spark: SparkSession, inputs: DataFlowEntities, actions: Seq[DataFlowAction], sqlTables: Set[String], tempFolder: Option[Path], schedulingMeta: SchedulingMeta, commitLabels: Map[String, LabelCommitDefinition] = Map.empty, tagState: DataFlowTagState = ..., commitMeta: CommitMeta = CommitMeta.empty, executor: DataFlowExecutor = Waimak.sparkExecutor()) extends Product with Serializable

    Permalink
  11. case class SparkFlowContext(spark: SparkSession) extends FlowContext with Product with Serializable

    Permalink

    Context required in a Spark data flow (SparkSession and FileSystem)

    Context required in a Spark data flow (SparkSession and FileSystem)

    Created by Vicky Avison on 23/02/2018.

    spark

    the SparkSession

  12. class SparkSimpleAction extends SimpleAction

    Permalink

    Spark specific simple action, that sets spark specific generics.

  13. type TableName = String

    Permalink

Value Members

  1. object ParquetDataCommitter extends Serializable

    Permalink
  2. object SparkActionHelpers

    Permalink
  3. object SparkActions

    Permalink

    Defines implicits for functional builders of the data flows.

    Defines implicits for functional builders of the data flows. Created by Vicky Avison, Alexei Perelighin and Alex Bush

  4. object SparkDataFlow

    Permalink
  5. object SparkFlowReporter extends FlowReporter

    Permalink
  6. object SparkInterceptors extends Logging

    Permalink

    Defines builder functions that add various interceptors to a SparkDataFlow

    Defines builder functions that add various interceptors to a SparkDataFlow

    Created by Alexei Perelighin on 2018/02/24

Inherited from AnyRef

Inherited from Any

Ungrouped