Partition a dataset using dataset columns.
Partition a dataset using dataset columns. To partition the dataset using the igestion time, use the reserved column names :
: Input dataset
: list of columns to use for partitioning.
The Spark session used to run this job
Just to force any spark job to implement its entry point using within the "run" method
Just to force any spark job to implement its entry point using within the "run" method
: Spark Session used for the job
Convert parquet files to CSV. The folder hierarchy should be in the form /input_folder/domain/schema/part*.parquet Once converted the csv files is put in the folder /output_folder/domain/schema.csv file When the specified number of parittions is 1 then /output_folder/domain/schema.csv is the file containing the data otherwise, it is a folder containng the part*.csv files. When output_folder is not specified, then the input_folder is used a the base output folder.