Class

com.ebiznext.comet.job.ingest

KafkaIngestionJob

Related Doc: package ingest

Permalink

class KafkaIngestionJob extends JsonIngestionJob

Main class to ingest JSON messages from Kafka

Linear Supertypes
JsonIngestionJob, IngestionJob, SparkJob, JobBase, StrictLogging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. KafkaIngestionJob
  2. JsonIngestionJob
  3. IngestionJob
  4. SparkJob
  5. JobBase
  6. StrictLogging
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new KafkaIngestionJob(domain: Domain, schema: Schema, types: List[Type], path: List[Path], storageHandler: StorageHandler, schemaHandler: SchemaHandler, options: Map[String, String], mode: Mode)(implicit settings: Settings)

    Permalink

    domain

    : Output Dataset Domain

    schema

    : Topic Name

    types

    : List of globally defined types

    path

    : Unused

    storageHandler

    : Storage Handler

Type Members

  1. type JdbcConfigName = String

    Permalink
    Definition Classes
    JobBase

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def analyze(fullTableName: String): Any

    Permalink
    Attributes
    protected
    Definition Classes
    SparkJob
  5. def appendToFile(storageHandler: StorageHandler, dataToSave: DataFrame, path: Path, datasetName: String, tableName: String): Unit

    Permalink

    Saves a dataset.

    Saves a dataset. If the path is empty (the first time we call metrics on the schema) then we can write.

    If there's already parquet files stored in it, then create a temporary directory to compute on, and flush the path to move updated metrics in it

    dataToSave

    : dataset to be saved

    path

    : Path to save the file at

    Attributes
    protected
    Definition Classes
    SparkJob
  6. def applyIgnore(dfIn: DataFrame): Dataset[Row]

    Permalink
    Attributes
    protected
    Definition Classes
    IngestionJob
  7. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def createSparkViews(views: Views, sqlParameters: Map[String, String]): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    SparkJob
  10. val domain: Domain

    Permalink

    : Input Dataset Domain

    : Input Dataset Domain

    Definition Classes
    JsonIngestionJobIngestionJob
  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  13. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  15. def getWriteMode(): WriteMode

    Permalink
    Definition Classes
    IngestionJob
  16. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  17. def ingest(dataset: DataFrame): (RDD[_], RDD[_])

    Permalink

    Where the magic happen

    Where the magic happen

    dataset

    input dataset as a RDD of string

    Attributes
    protected
    Definition Classes
    JsonIngestionJobIngestionJob
  18. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  19. def loadDataSet(): Try[DataFrame]

    Permalink

    load the json as an RDD of String

    load the json as an RDD of String

    returns

    Spark Dataframe loaded using metadata options

    Attributes
    protected
    Definition Classes
    JsonIngestionJobIngestionJob
  20. def loadJsonData(): Dataset[String]

    Permalink

    Load dataset using spark csv reader and all metadata.

    Load dataset using spark csv reader and all metadata. Does not infer schema. columns not defined in the schema are dropped fro the dataset (require datsets with a header)

    returns

    Spark DataFrame where each row holds a single string

    Attributes
    protected
    Definition Classes
    KafkaIngestionJobJsonIngestionJob
  21. val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    StrictLogging
  22. lazy val metadata: Metadata

    Permalink

    Merged metadata

    Merged metadata

    Definition Classes
    IngestionJob
  23. def name: String

    Permalink
    Definition Classes
    JsonIngestionJobJobBase
  24. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  25. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  26. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. val now: Timestamp

    Permalink
    Definition Classes
    IngestionJob
  28. var offsets: List[(Int, Long)]

    Permalink
  29. val options: Map[String, String]

    Permalink
    Definition Classes
    JsonIngestionJobIngestionJob
  30. def parseViewDefinition(valueWithEnv: String): (SinkType, Option[JdbcConfigName], String)

    Permalink
    Attributes
    protected
    Definition Classes
    JobBase
  31. def partitionDataset(dataset: DataFrame, partition: List[String]): DataFrame

    Permalink
    Attributes
    protected
    Definition Classes
    SparkJob
  32. def partitionedDatasetWriter(dataset: DataFrame, partition: List[String]): DataFrameWriter[Row]

    Permalink

    Partition a dataset using dataset columns.

    Partition a dataset using dataset columns. To partition the dataset using the ingestion time, use the reserved column names :

    • comet_date
    • comet_year
    • comet_month
    • comet_day
    • comet_hour
    • comet_minute These columns are renamed to "date", "year", "month", "day", "hour", "minute" in the dataset and their values is set to the current date/time.
    dataset

    : Input dataset

    partition

    : list of columns to use for partitioning.

    returns

    The Spark session used to run this job

    Attributes
    protected
    Definition Classes
    SparkJob
  33. val path: List[Path]

    Permalink

    : Input dataset path

    : Input dataset path

    Definition Classes
    JsonIngestionJobIngestionJob
  34. def registerUdf(udf: String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    SparkJob
  35. def reorderAttributes(dataFrame: DataFrame): List[Attribute]

    Permalink
    Definition Classes
    IngestionJob
  36. def run(): Try[JobResult]

    Permalink

    Main entry point as required by the Spark Job interface

    Main entry point as required by the Spark Job interface

    returns

    : Spark Session used for the job

    Definition Classes
    KafkaIngestionJobIngestionJobJobBase
  37. def saveAccepted(acceptedDF: DataFrame): (DataFrame, Path)

    Permalink

    Merge new and existing dataset if required Save using overwrite / Append mode

    Merge new and existing dataset if required Save using overwrite / Append mode

    Attributes
    protected
    Definition Classes
    IngestionJob
  38. def saveRejected(rejectedRDD: RDD[String]): Try[Path]

    Permalink
    Attributes
    protected
    Definition Classes
    IngestionJob
  39. val schema: Schema

    Permalink

    : Input Dataset Schema

    : Input Dataset Schema

    Definition Classes
    JsonIngestionJobIngestionJob
  40. val schemaHandler: SchemaHandler

    Permalink
    Definition Classes
    JsonIngestionJobIngestionJob
  41. lazy val schemaSparkType: StructType

    Permalink
    Definition Classes
    JsonIngestionJob
  42. lazy val session: SparkSession

    Permalink
    Definition Classes
    SparkJob
  43. implicit val settings: Settings

    Permalink
    Definition Classes
    JsonIngestionJobJobBase
  44. lazy val sparkEnv: SparkEnv

    Permalink
    Definition Classes
    SparkJob
  45. val storageHandler: StorageHandler

    Permalink

    : Storage Handler

    : Storage Handler

    Definition Classes
    JsonIngestionJobIngestionJob
  46. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  47. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  48. val types: List[Type]

    Permalink

    : List of globally defined types

    : List of globally defined types

    Definition Classes
    JsonIngestionJobIngestionJob
  49. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  50. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  51. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from JsonIngestionJob

Inherited from IngestionJob

Inherited from SparkJob

Inherited from JobBase

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped