io.smartdatalake.workflow.dataobject
partition layout defines how partition values can be extracted from the path. Use "%<colname>%" as token to extract the value for a partition column. With "%<colname:regex>%" a regex can be given to limit search. This is especially useful if there is no char to delimit the last token from the rest of the path or also between two tokens.
Overwrite or Append new data.
Optional definition of partitions expected to exist. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false Default is to expect all partitions to exist.
create empty partition
create empty partition
Create empty partitions for partition values not yet existing
Create empty partitions for partition values not yet existing
Delete all data.
Delete all data. This is used to implement SaveMode.Overwrite.
Delete given files.
Delete given files. This is used to cleanup files after they are processed.
Delete given partitions.
Delete given partitions. This is used to cleanup partitions after they are processed.
Optional definition of partitions expected to exist.
Optional definition of partitions expected to exist. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false Default is to expect all partitions to exist.
Extract partition values from a given file path
Extract partition values from a given file path
Returns the factory that can parse this type (that is, type CO
).
Returns the factory that can parse this type (that is, type CO
).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
the factory (object) for this class.
Definition of fileName.
Definition of fileName. Default is an asterix to match everything. This is concatenated with the partition layout to search for files.
Filter list of partition values by expected partitions condition
Filter list of partition values by expected partitions condition
Handle class cast exception when getting objects from instance registry
Handle class cast exception when getting objects from instance registry
List files for given partition values
List files for given partition values
List of partition values to be filtered. If empty all files in root path of DataObject will be listed.
List of FileRefs
get partition values formatted by partition layout
get partition values formatted by partition layout
Method for subclasses to override the base path for this DataObject.
Method for subclasses to override the base path for this DataObject. This is for instance needed if pathPrefix is defined in a connection.
prepare paths to be searched
prepare paths to be searched
A unique identifier for this instance.
A unique identifier for this instance.
List partitions on data object's root path
List partitions on data object's root path
Additional metadata for the DataObject
Additional metadata for the DataObject
partition layout defines how partition values can be extracted from the path.
partition layout defines how partition values can be extracted from the path. Use "%<colname>%" as token to extract the value for a partition column. With "%<colname:regex>%" a regex can be given to limit search. This is especially useful if there is no char to delimit the last token from the rest of the path or also between two tokens.
Definition of partition columns
Definition of partition columns
The root path of the files that are handled by this DataObject.
The root path of the files that are handled by this DataObject.
Runs operations after reading from DataObject
Runs operations after reading from DataObject
Runs operations after writing to DataObject
Runs operations after writing to DataObject
Runs operations before reading from DataObject
Runs operations before reading from DataObject
Runs operations before writing to DataObject Note: As the transformed SubFeed doesnt yet exist in Action.preWrite, no partition values can be passed as parameters as in preRead
Runs operations before writing to DataObject Note: As the transformed SubFeed doesnt yet exist in Action.preWrite, no partition values can be passed as parameters as in preRead
Prepare & test DataObject's prerequisits
Prepare & test DataObject's prerequisits
This runs during the "prepare" operation of the DAG.
Overwrite or Append new data.
Overwrite or Append new data.
default separator for paths
default separator for paths
Given some FileRefs for another DataObject, translate the paths to the root path of this DataObject
Given some FileRefs for another DataObject, translate the paths to the root path of this DataObject
Connects to SFtp files Needs java library "com.hieronymus % sshj % 0.21.1" The following authentication mechanisms are supported -> public/private-key: private key must be saved in ~/.ssh, public key must be registered on server. -> user/pwd authentication: user and password is taken from two variables set as parameters. These variables could come from clear text (CLEAR), a file (FILE) or an environment variable (ENV)
partition layout defines how partition values can be extracted from the path. Use "%<colname>%" as token to extract the value for a partition column. With "%<colname:regex>%" a regex can be given to limit search. This is especially useful if there is no char to delimit the last token from the rest of the path or also between two tokens.
Overwrite or Append new data.
Optional definition of partitions expected to exist. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false Default is to expect all partitions to exist.