Trait

com.ebiznext.comet.job.ingest

IngestionJob

Related Doc: package ingest

Permalink

trait IngestionJob extends SparkJob

Linear Supertypes
SparkJob, StrictLogging, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. IngestionJob
  2. SparkJob
  3. StrictLogging
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def domain: Domain

    Permalink
  2. abstract def ingest(dataset: DataFrame): (RDD[_], RDD[_])

    Permalink

    ingestion algorithm

  3. abstract def loadDataSet(): Try[DataFrame]

    Permalink

    Dataset loading strategy (JSON / CSV / ...)

    Dataset loading strategy (JSON / CSV / ...)

    returns

    Spark Dataframe loaded using metadata options

  4. abstract def name: String

    Permalink
    Definition Classes
    SparkJob
  5. abstract def path: List[Path]

    Permalink
  6. abstract def schema: Schema

    Permalink
  7. abstract def schemaHandler: SchemaHandler

    Permalink
  8. implicit abstract def settings: Settings

    Permalink
    Definition Classes
    SparkJob
  9. abstract def storageHandler: StorageHandler

    Permalink
  10. abstract def types: List[Type]

    Permalink

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def getWriteMode(): WriteMode

    Permalink
  11. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  12. def index(mergedDF: DataFrame): Unit

    Permalink
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    StrictLogging
  15. def merge(inputDF: DataFrame, existingDF: DataFrame, merge: MergeOptions): DataFrame

    Permalink

    Merge incoming and existing dataframes using merge options

    Merge incoming and existing dataframes using merge options

    returns

    merged dataframe

  16. lazy val metadata: Metadata

    Permalink

    Merged metadata

  17. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. val now: Timestamp

    Permalink
  21. def partitionDataset(dataset: DataFrame, partition: List[String]): DataFrame

    Permalink
    Definition Classes
    SparkJob
  22. def partitionedDatasetWriter(dataset: DataFrame, partition: List[String]): DataFrameWriter[Row]

    Permalink

    Partition a dataset using dataset columns.

    Partition a dataset using dataset columns. To partition the dataset using the igestion time, use the reserved column names :

    • comet_year
    • comet_month
    • comet_day
    • comet_hour
    • comet_minute These columsn are renamed to "year", "month", "day", "hour", "minute" in the dataset and their values is set to the current date/time.
    dataset

    : Input dataset

    partition

    : list of columns to use for partitioning.

    returns

    The Spark session used to run this job

    Definition Classes
    SparkJob
  23. def run(): Try[SparkSession]

    Permalink

    Main entry point as required by the Spark Job interface

    Main entry point as required by the Spark Job interface

    returns

    : Spark Session used for the job

    Definition Classes
    IngestionJobSparkJob
  24. def saveAccepted(acceptedDF: DataFrame): (DataFrame, Path)

    Permalink

    Merge new and existing dataset if required Save using overwrite / Append mode

  25. def saveRejected(rejectedRDD: RDD[String]): Try[Path]

    Permalink
  26. def saveRows(dataset: DataFrame, targetPath: Path, writeMode: WriteMode, area: StorageArea, merge: Boolean): (DataFrameWriter[Row], String)

    Permalink

    Save typed dataset in parquet.

    Save typed dataset in parquet. If hive support is active, also register it as a Hive Table and if analyze is active, also compute basic statistics

    dataset

    : dataset to save

    targetPath

    : absolute path

    writeMode

    : Append or overwrite

    area

    : accepted or rejected area

  27. lazy val session: SparkSession

    Permalink
    Definition Classes
    SparkJob
  28. lazy val sparkEnv: SparkEnv

    Permalink
    Definition Classes
    SparkJob
  29. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  30. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  31. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  33. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from SparkJob

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped