: Job Name as defined in the YML job description file
: Where the resulting dataset is stored by default if not specified in the task
: Task to run
: Sql Parameters to pass to SQL statements
Saves a dataset.
Saves a dataset. If the path is empty (the first time we call metrics on the schema) then we can write.
If there's already parquet files stored in it, then create a temporary directory to compute on, and flush the path to move updated metrics in it
: dataset to be saved
: Path to save the file at
: Job Name as defined in the YML job description file
: Job Name as defined in the YML job description file
Partition a dataset using dataset columns.
Partition a dataset using dataset columns. To partition the dataset using the ingestion time, use the reserved column names :
: Input dataset
: list of columns to use for partitioning.
The Spark session used to run this job
Just to force any job to implement its entry point using within the "run" method
Just to force any job to implement its entry point using within the "run" method
: Spark Dataframe for Spark Jobs None otherwise
Execute the SQL Task and store it in parquet/orc/.... If Hive support is enabled, also store it as a Hive Table. If analyze support is active, also compute basic statistics for twhe dataset.