Class

com.ebiznext.comet.job.ingest

JsonIngestionJob

Related Doc: package ingest

Permalink

class JsonIngestionJob extends IngestionJob

Main class to complex json delimiter separated values file If your json contains only one level simple attribute aka. kind of dsv but in json format please use SIMPLE_JSON instead. It's way faster

Linear Supertypes
IngestionJob, SparkJob, StrictLogging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. JsonIngestionJob
  2. IngestionJob
  3. SparkJob
  4. StrictLogging
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new JsonIngestionJob(domain: Domain, schema: Schema, types: List[Type], path: List[Path], storageHandler: StorageHandler, schemaHandler: SchemaHandler)(implicit settings: Settings)

    Permalink

    domain

    : Input Dataset Domain

    schema

    : Input Dataset Schema

    types

    : List of globally defined types

    path

    : Input dataset path

    storageHandler

    : Storage Handler

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. val domain: Domain

    Permalink

    : Input Dataset Domain

    : Input Dataset Domain

    Definition Classes
    JsonIngestionJobIngestionJob
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  11. def getWriteMode(): WriteMode

    Permalink
    Definition Classes
    IngestionJob
  12. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  13. def index(mergedDF: DataFrame): Unit

    Permalink
    Definition Classes
    IngestionJob
  14. def ingest(dataset: DataFrame): (RDD[_], RDD[_])

    Permalink

    Where the magic happen

    Where the magic happen

    dataset

    input dataset as a RDD of string

    Definition Classes
    JsonIngestionJobIngestionJob
  15. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  16. def loadDataSet(): Try[DataFrame]

    Permalink

    load the json as an RDD of String

    load the json as an RDD of String

    returns

    Spark Dataframe loaded using metadata options

    Definition Classes
    JsonIngestionJobIngestionJob
  17. val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    StrictLogging
  18. def merge(inputDF: DataFrame, existingDF: DataFrame, merge: MergeOptions): DataFrame

    Permalink

    Merge incoming and existing dataframes using merge options

    Merge incoming and existing dataframes using merge options

    returns

    merged dataframe

    Definition Classes
    IngestionJob
  19. lazy val metadata: Metadata

    Permalink

    Merged metadata

    Merged metadata

    Definition Classes
    IngestionJob
  20. def name: String

    Permalink
    Definition Classes
    JsonIngestionJobSparkJob
  21. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  22. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  23. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  24. val now: Timestamp

    Permalink
    Definition Classes
    IngestionJob
  25. def partitionDataset(dataset: DataFrame, partition: List[String]): DataFrame

    Permalink
    Definition Classes
    SparkJob
  26. def partitionedDatasetWriter(dataset: DataFrame, partition: List[String]): DataFrameWriter[Row]

    Permalink

    Partition a dataset using dataset columns.

    Partition a dataset using dataset columns. To partition the dataset using the igestion time, use the reserved column names :

    • comet_year
    • comet_month
    • comet_day
    • comet_hour
    • comet_minute These columsn are renamed to "year", "month", "day", "hour", "minute" in the dataset and their values is set to the current date/time.
    dataset

    : Input dataset

    partition

    : list of columns to use for partitioning.

    returns

    The Spark session used to run this job

    Definition Classes
    SparkJob
  27. val path: List[Path]

    Permalink

    : Input dataset path

    : Input dataset path

    Definition Classes
    JsonIngestionJobIngestionJob
  28. def run(): Try[SparkSession]

    Permalink

    Main entry point as required by the Spark Job interface

    Main entry point as required by the Spark Job interface

    returns

    : Spark Session used for the job

    Definition Classes
    IngestionJobSparkJob
  29. def saveAccepted(acceptedDF: DataFrame): (DataFrame, Path)

    Permalink

    Merge new and existing dataset if required Save using overwrite / Append mode

    Merge new and existing dataset if required Save using overwrite / Append mode

    Definition Classes
    IngestionJob
  30. def saveRejected(rejectedRDD: RDD[String]): Try[Path]

    Permalink
    Definition Classes
    IngestionJob
  31. def saveRows(dataset: DataFrame, targetPath: Path, writeMode: WriteMode, area: StorageArea, merge: Boolean): (DataFrameWriter[Row], String)

    Permalink

    Save typed dataset in parquet.

    Save typed dataset in parquet. If hive support is active, also register it as a Hive Table and if analyze is active, also compute basic statistics

    dataset

    : dataset to save

    targetPath

    : absolute path

    writeMode

    : Append or overwrite

    area

    : accepted or rejected area

    Definition Classes
    IngestionJob
  32. val schema: Schema

    Permalink

    : Input Dataset Schema

    : Input Dataset Schema

    Definition Classes
    JsonIngestionJobIngestionJob
  33. val schemaHandler: SchemaHandler

    Permalink
    Definition Classes
    JsonIngestionJobIngestionJob
  34. lazy val schemaSparkType: StructType

    Permalink
  35. lazy val session: SparkSession

    Permalink
    Definition Classes
    SparkJob
  36. implicit val settings: Settings

    Permalink
    Definition Classes
    JsonIngestionJobSparkJob
  37. lazy val sparkEnv: SparkEnv

    Permalink
    Definition Classes
    SparkJob
  38. val storageHandler: StorageHandler

    Permalink

    : Storage Handler

    : Storage Handler

    Definition Classes
    JsonIngestionJobIngestionJob
  39. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  40. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  41. val types: List[Type]

    Permalink

    : List of globally defined types

    : List of globally defined types

    Definition Classes
    JsonIngestionJobIngestionJob
  42. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  43. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def saveAccepted(acceptedRDD: RDD[Row]): Path

    Permalink

    Use the schema we used for validation when saving

    Use the schema we used for validation when saving

    Annotations
    @deprecated
    Deprecated

    (Since version ) We let Spark compute the final schema

Inherited from IngestionJob

Inherited from SparkJob

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped