: Input Dataset Domain
: Input Dataset Schema
: List of globally defined types
: Input dataset path
: Storage Handler
: Input Dataset Domain
: Input Dataset Domain
Where the magic happen
Where the magic happen
input dataset as a RDD of string
load the json as an RDD of String
load the json as an RDD of String
Spark Dataframe loaded using metadata options
Merge incoming and existing dataframes using merge options
Merge incoming and existing dataframes using merge options
merged dataframe
Merged metadata
Merged metadata
Partition a dataset using dataset columns.
Partition a dataset using dataset columns. To partition the dataset using the igestion time, use the reserved column names :
: Input dataset
: list of columns to use for partitioning.
The Spark session used to run this job
: Input dataset path
: Input dataset path
Main entry point as required by the Spark Job interface
Main entry point as required by the Spark Job interface
: Spark Session used for the job
Merge new and existing dataset if required Save using overwrite / Append mode
Merge new and existing dataset if required Save using overwrite / Append mode
Save typed dataset in parquet.
Save typed dataset in parquet. If hive support is active, also register it as a Hive Table and if analyze is active, also compute basic statistics
: dataset to save
: absolute path
: Append or overwrite
: accepted or rejected area
: Input Dataset Schema
: Input Dataset Schema
: Storage Handler
: Storage Handler
: List of globally defined types
: List of globally defined types
Main class to complex json delimiter separated values file If your json contains only one level simple attribute aka. kind of dsv but in json format please use SIMPLE_JSON instead. It's way faster