Class

com.ebiznext.comet.job.ingest

SimpleJsonIngestionJob

Related Doc: package ingest

Permalink

class SimpleJsonIngestionJob extends DsvIngestionJob

Parse a simple one level json file. Complex types such as arrays & maps are not supported. Use JsonIngestionJob instead. This class is for simple json only that makes it way faster.

Linear Supertypes
DsvIngestionJob, IngestionJob, SparkJob, JobBase, StrictLogging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SimpleJsonIngestionJob
  2. DsvIngestionJob
  3. IngestionJob
  4. SparkJob
  5. JobBase
  6. StrictLogging
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SimpleJsonIngestionJob(domain: Domain, schema: Schema, types: List[Type], path: List[Path], storageHandler: StorageHandler, schemaHandler: SchemaHandler)(implicit settings: Settings)

    Permalink

    domain

    : Input Dataset Domain

    schema

    : Input Dataset Schema

    types

    : List of globally defined types

    path

    : Input dataset path

    storageHandler

    : Storage Handler

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def analyze(fullTableName: String): Any

    Permalink
    Definition Classes
    SparkJob
  5. def applyIgnore(dfIn: DataFrame): Dataset[Row]

    Permalink
    Definition Classes
    IngestionJob
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def cleanHeaderCol(header: String): String

    Permalink

    remove any extra quote / BOM in the header

    remove any extra quote / BOM in the header

    header

    : Header column name

    Definition Classes
    DsvIngestionJob
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. val domain: Domain

    Permalink

    : Input Dataset Domain

    : Input Dataset Domain

    Definition Classes
    DsvIngestionJobIngestionJob
  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  14. def getWriteMode(): WriteMode

    Permalink
    Definition Classes
    IngestionJob
  15. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  16. def ingest(dataset: DataFrame): (RDD[_], RDD[_])

    Permalink

    Apply the schema to the dataset.

    Apply the schema to the dataset. This is where all the magic happen Valid records are stored in the accepted path / table and invalid records in the rejected path / table

    dataset

    : Spark Dataset

    Definition Classes
    DsvIngestionJobIngestionJob
  17. def intersectHeaders(datasetHeaders: List[String], schemaHeaders: List[String]): (List[String], List[String])

    Permalink

    datasetHeaders

    : Headers found in the dataset

    schemaHeaders

    : Headers defined in the schema

    returns

    two lists : One with thecolumns present in the schema and the dataset and onther with the headers present in the dataset only

    Definition Classes
    DsvIngestionJob
  18. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  19. def loadDataSet(): Try[DataFrame]

    Permalink

    Load dataset using spark csv reader and all metadata.

    Load dataset using spark csv reader and all metadata. Does not infer schema. columns not defined in the schema are dropped fro the dataset (require datsets with a header)

    returns

    Spark Dataset

    Definition Classes
    SimpleJsonIngestionJobDsvIngestionJobIngestionJob
  20. val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    StrictLogging
  21. def merge(inputDF: DataFrame, existingDF: DataFrame, merge: MergeOptions): DataFrame

    Permalink

    Merge incoming and existing dataframes using merge options

    Merge incoming and existing dataframes using merge options

    returns

    merged dataframe

    Definition Classes
    IngestionJob
  22. lazy val metadata: Metadata

    Permalink

    Merged metadata

    Merged metadata

    Definition Classes
    IngestionJob
  23. def name: String

    Permalink

    returns

    Spark Job name

    Definition Classes
    DsvIngestionJobJobBase
  24. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  25. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  26. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. val now: Timestamp

    Permalink
    Definition Classes
    IngestionJob
  28. def partitionDataset(dataset: DataFrame, partition: List[String]): DataFrame

    Permalink
    Definition Classes
    SparkJob
  29. def partitionedDatasetWriter(dataset: DataFrame, partition: List[String]): DataFrameWriter[Row]

    Permalink

    Partition a dataset using dataset columns.

    Partition a dataset using dataset columns. To partition the dataset using the ingestion time, use the reserved column names :

    • comet_date
    • comet_year
    • comet_month
    • comet_day
    • comet_hour
    • comet_minute These columns are renamed to "date", "year", "month", "day", "hour", "minute" in the dataset and their values is set to the current date/time.
    dataset

    : Input dataset

    partition

    : list of columns to use for partitioning.

    returns

    The Spark session used to run this job

    Definition Classes
    SparkJob
  30. val path: List[Path]

    Permalink

    : Input dataset path

    : Input dataset path

    Definition Classes
    DsvIngestionJobIngestionJob
  31. def rowValidator(): DsvValidator

    Permalink
    Definition Classes
    DsvIngestionJob
  32. def run(): Try[JobResult]

    Permalink

    Main entry point as required by the Spark Job interface

    Main entry point as required by the Spark Job interface

    returns

    : Spark Session used for the job

    Definition Classes
    IngestionJobJobBase
  33. def saveAccepted(acceptedRDD: RDD[Row], orderedSparkTypes: StructType): (DataFrame, Path)

    Permalink
    Definition Classes
    DsvIngestionJob
  34. def saveAccepted(acceptedDF: DataFrame): (DataFrame, Path)

    Permalink

    Merge new and existing dataset if required Save using overwrite / Append mode

    Merge new and existing dataset if required Save using overwrite / Append mode

    Definition Classes
    IngestionJob
  35. def saveRejected(rejectedRDD: RDD[String]): Try[Path]

    Permalink
    Definition Classes
    IngestionJob
  36. def saveRows(dataset: DataFrame, targetPath: Path, writeMode: WriteMode, area: StorageArea, merge: Boolean): DataFrame

    Permalink

    Save typed dataset in parquet.

    Save typed dataset in parquet. If hive support is active, also register it as a Hive Table and if analyze is active, also compute basic statistics

    dataset

    : dataset to save

    targetPath

    : absolute path

    writeMode

    : Append or overwrite

    area

    : accepted or rejected area

    Definition Classes
    IngestionJob
  37. val schema: Schema

    Permalink

    : Input Dataset Schema

    : Input Dataset Schema

    Definition Classes
    DsvIngestionJobIngestionJob
  38. val schemaHandler: SchemaHandler

    Permalink
    Definition Classes
    DsvIngestionJobIngestionJob
  39. val schemaHeaders: List[String]

    Permalink

    dataset Header names as defined by the schema

    dataset Header names as defined by the schema

    Definition Classes
    DsvIngestionJob
  40. lazy val session: SparkSession

    Permalink
    Definition Classes
    SparkJob
  41. implicit val settings: Settings

    Permalink
    Definition Classes
    DsvIngestionJobJobBase
  42. def sink(mergedDF: DataFrame): Unit

    Permalink
    Definition Classes
    IngestionJob
  43. lazy val sparkEnv: SparkEnv

    Permalink
    Definition Classes
    SparkJob
  44. val storageHandler: StorageHandler

    Permalink

    : Storage Handler

    : Storage Handler

    Definition Classes
    DsvIngestionJobIngestionJob
  45. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  46. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  47. val types: List[Type]

    Permalink

    : List of globally defined types

    : List of globally defined types

    Definition Classes
    DsvIngestionJobIngestionJob
  48. def validateHeader(datasetHeaders: List[String], schemaHeaders: List[String]): Boolean

    Permalink

    datasetHeaders

    : Headers found in the dataset

    schemaHeaders

    : Headers defined in the schema

    returns

    success if all headers in the schema exist in the dataset

    Definition Classes
    DsvIngestionJob
  49. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  50. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  51. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from DsvIngestionJob

Inherited from IngestionJob

Inherited from SparkJob

Inherited from JobBase

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped