com.coxautodata.waimak.dataflow.spark
folder under which final labels will store its data. Ex: baseFolder/label_1/
optional name of the snapshot folder that will be used by all of the labels committed via this committer. It needs to be a full name and must not be the same as in any of the previous snapshots for any of the commit managed labels. Ex: baseFolder/label_1/snapshot_folder=20181128 baseFolder/label_1/snapshot_folder=20181129 baseFolder/label_2/snapshot_folder=20181128 baseFolder/label_2/snapshot_folder=20181129
optional function that takes the list of available snapshots and returns list of snapshots to remove
optional connector to the DB.
optional function that takes the list of available snapshots and returns list of snapshots to remove
Adds actions that are preformed when all data is fully committed/moved into permanent storage.
Adds actions that are preformed when all data is fully committed/moved into permanent storage. Can be used to do cleanup operations.
logical name of the commit
A UUID generated at runtime unique to a commit name
labels that were committed
data flow to which to add finalise actions
data flow with finalise actions
optional connector to the DB.
Takes a value of type A and a msg to log, returning a and logging the message at the desired level
Takes a value of type A and a msg to log, returning a and logging the message at the desired level
a
Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter
Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter
a
logAndReturn(1, (num: Int) => s"number: $num", Info) // In the log we would see a log corresponding to "number 1"
Adds actions to the flow that move data to the permanent storage, simulating a wave commit
Adds actions to the flow that move data to the permanent storage, simulating a wave commit
logical name of the commit
A UUID generated at runtime unique to a commit name
labels to move
data flow to which move actions are added to
data flow with move actions
folder under which final labels will store its data.
folder under which final labels will store its data. Ex: baseFolder/label_1/
optional name of the snapshot folder that will be used by all of the labels committed via this committer.
optional name of the snapshot folder that will be used by all of the labels committed via this committer. It needs to be a full name and must not be the same as in any of the previous snapshots for any of the commit managed labels. Ex: baseFolder/label_1/snapshot_folder=20181128 baseFolder/label_1/snapshot_folder=20181129 baseFolder/label_2/snapshot_folder=20181128 baseFolder/label_2/snapshot_folder=20181129
Adds cache actions to the flow.
Adds cache actions to the flow.
logical name of the commit
A UUID generated at runtime unique to a commit name
labels to cache
data flow to which the caching actions are added to
data flow with caching actions
Validates that: 1) data flow is a decedent of the SparkDataFlow 2) data flow has temp folder 3) no committed label has an existing snapshot folder same as new one 4) cleanup can only take place when snapshot folder is defined
Validates that: 1) data flow is a decedent of the SparkDataFlow 2) data flow has temp folder 3) no committed label has an existing snapshot folder same as new one 4) cleanup can only take place when snapshot folder is defined
data flow to validate
Set a cleanup strategy for this Parquet Committer
Configures a default implementation of a cleanup strategy based on dates encoded into snapshot folder name.
Sets new DB connector
Set a snapshot folder for this Parquet Committer
Adds actions necessary to commit labels as parquet parquet, supports snapshot folders and interaction with a DB connector.
Created by Alexei Perelighin on 2018/11/05
folder under which final labels will store its data. Ex: baseFolder/label_1/
optional name of the snapshot folder that will be used by all of the labels committed via this committer. It needs to be a full name and must not be the same as in any of the previous snapshots for any of the commit managed labels. Ex: baseFolder/label_1/snapshot_folder=20181128 baseFolder/label_1/snapshot_folder=20181129 baseFolder/label_2/snapshot_folder=20181128 baseFolder/label_2/snapshot_folder=20181129
optional function that takes the list of available snapshots and returns list of snapshots to remove
optional connector to the DB.