io.smartdatalake.workflow.dataobject
Delete all data.
Delete all data. This is used to implement SaveMode.Overwrite.
Delete given files.
Delete given files. This is used to cleanup files after they are processed.
Definition of partitions that are expected to exists.
Definition of partitions that are expected to exists. This is used to validate that partitions being read exists and don't return no data. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false example: "elements['yourColName'] > 2017"
true if partition is expected to exist.
Extract partition values from a given file path
Extract partition values from a given file path
Returns the factory that can parse this type (that is, type CO
).
Returns the factory that can parse this type (that is, type CO
).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
the factory (object) for this class.
Definition of fileName.
Definition of fileName. Default is an asterix to match everything. This is concatenated with the partition layout to search for files.
Handle class cast exception when getting objects from instance registry
Handle class cast exception when getting objects from instance registry
List files for given partition values
List files for given partition values
List of partition values to be filtered. If empty all files in root path of DataObject will be listed.
List of FileRefs
get partition values formatted by partition layout
get partition values formatted by partition layout
Method for subclasses to override the base path for this DataObject.
Method for subclasses to override the base path for this DataObject. This is for instance needed if pathPrefix is defined in a connection.
prepare paths to be searched
prepare paths to be searched
A unique identifier for this instance.
A unique identifier for this instance.
list partition values
list partition values
Additional metadata for the DataObject
Additional metadata for the DataObject
Definition of partition layout use %<partitionColName>% as placeholder and * for globs in layout Note: if you have globs in partition layout, it's not possible to write files to this DataObject Note: if this is a directory, you must add a final backslash to the partition layout
Definition of partition layout use %<partitionColName>% as placeholder and * for globs in layout Note: if you have globs in partition layout, it's not possible to write files to this DataObject Note: if this is a directory, you must add a final backslash to the partition layout
Definition of partition columns
Definition of partition columns
The root path of the files that are handled by this DataObject.
The root path of the files that are handled by this DataObject.
Prepare & test DataObject's prerequisits
Prepare & test DataObject's prerequisits
This runs during the "prepare" operation of the DAG.
Overwrite or Append new data.
Overwrite or Append new data. When writing partitioned data, this applies only to partitions concerned.
default separator for paths
default separator for paths
Given some FileRefs for another DataObject, translate the paths to the root path of this DataObject
Given some FileRefs for another DataObject, translate the paths to the root path of this DataObject