Action that deletes snapshots based on the cleanup strategy.
Action that deletes snapshots based on the cleanup strategy. It can cleanup one or more labels.
root folder that contains label folders
returns list of snapshot/folder to remove
Adds actions necessary to commit labels as parquet parquet, supports snapshot folders and interaction with a DB connector.
Adds actions necessary to commit labels as parquet parquet, supports snapshot folders and interaction with a DB connector.
Created by Alexei Perelighin on 2018/11/05
folder under which final labels will store its data. Ex: baseFolder/label_1/
optional name of the snapshot folder that will be used by all of the labels committed via this committer. It needs to be a full name and must not be the same as in any of the previous snapshots for any of the commit managed labels. Ex: baseFolder/label_1/snapshot_folder=20181128 baseFolder/label_1/snapshot_folder=20181129 baseFolder/label_2/snapshot_folder=20181128 baseFolder/label_2/snapshot_folder=20181129
optional function that takes the list of available snapshots and returns list of snapshots to remove
optional connector to the DB.
Instances of this class build a bridge between OOP part of the Waimak engine and functional definition of the data flow.
Instances of this class build a bridge between OOP part of the Waimak engine and functional definition of the data flow.
Created by Alexei Perelighin on 03/11/17.
Introduces spark session into the data flows
Defines functional builder for spark specific data flows and common functionalities like reading csv/parquet/hive data, adding spark SQL steps, data set steps, writing data out into various formats, staging and committing multiple outputs into storage like HDFS, Hive/Impala.
Context required in a Spark data flow (SparkSession and FileSystem)
Context required in a Spark data flow (SparkSession and FileSystem)
Created by Vicky Avison on 23/02/2018.
the SparkSession
Spark specific simple action, that sets spark specific generics.
Write a file or files with a specific filename to a folder.
Write a file or files with a specific filename to a folder.
Allows you to control the final output filename without the Spark-generated part UUIDs.
Filename will be $filenamePrefix.extension
if number of files is 1, otherwise
$filenamePrefix.$fileNumber.extension
where file number is incremental and zero-padded.
Label to write
Base location of temporary folder
Destination path to put files in
Number of files to generate
Prefix of name of the file up to the filenumber and extension
Format to write (e.g. parquet, csv)
Options to pass to the DataFrameWriter
Defines builder functions that add various interceptors to a SparkDataFlow
Defines builder functions that add various interceptors to a SparkDataFlow
Created by Alexei Perelighin on 2018/02/24
Spark Actions are now automatically included in a SparkDataFlow