Set to true if you want to enable automatic caching of DataFrames that are used multiple times (default=true).
Set to true to enable check for duplicate first class object definitions when loading configuration (default=true).
Set to true to enable check for duplicate first class object definitions when loading configuration (default=true). The check fails if Connections, DataObjects or Actions are defined in multiple locations.
Set to true if you want to enable workaround to overwrite unpartitioned SparkFileDataObject on Azure ADLSv2 (default=false).
List of hadoop authorities for which acls must be configured The environment parameter can contain multiple authorities separated by comma.
List of hadoop authorities for which acls must be configured The environment parameter can contain multiple authorities separated by comma. An authority is compared against the filesystem URI with contains(...)
Set default hadoop schema and authority for path
Limit setting ACL's to Basedir (default=true) See hdfsAclsUserHomeLevel or hdfsBasedir on how the basedir is determined
Modifying ACL's is only allowed below and including the following level (default=2) See also io.smartdatalake.util.misc.AclUtil
Overwriting ACL's is only allowed below and including the following level (default=5) See also io.smartdatalake.util.misc.AclUtil
Set path level of user home to determine basedir automatically (Default=2 -> /user/myUserHome)
Set basedir explicitly.
Set basedir explicitly. This overrides automatically detected user home for acl constraints by hdfsAclsUserHomeLevel.
ordering of columns in SchemaEvolution result - true: result schema is ordered according to existing schema, new columns are appended - false: result schema is ordered according to new schema, deleted columns are appended
If true
, schema validation inspects the whole hierarchy of structured data types.
If true
, schema validation inspects the whole hierarchy of structured data types. This allows partial matches
for schemaMin
validation.
If false
, structural data types must match exactly to validate.
Using io.smartdatalake.workflow.dataobject.SchemaValidation.validateSchemaMin:
val schema = StructType.fromDDL("c1 STRING, c2 STRUCT(c2_1 INT, c2_2 STRING)") validates
against StructType.fromDDL("c1 STRING, c2 STRUCT(c2_1 INT)") only if schemaValidationDeepComarison == true
.
If true
, schema validation does not consider nullability of columns/fields when checking for equality.
If true
, schema validation does not consider nullability of columns/fields when checking for equality.
If false
, schema validation considers two columns/fields different when their nullability property is not equal.
Environment dependent configurations. They can be set - by Java system properties (prefixed with "sdl.", e.g. "sdl.hadoopAuthoritiesWithAclsRequired") - by Environment variables (prefixed with "SDL_" and camelCase converted to uppercase, e.g. "SDL_HADOOP_AUTHORITIES_WITH_ACLS_REQUIRED") - by a custom io.smartdatalake.app.SmartDataLakeBuilder implementation for your environment, which sets these variables directly.