Default Smart Data Lake Command Line Application.
Global configuration options
Global configuration options
classes to register for spark kryo serialization
spark options
enable hive for spark session
enable periodic memory usage logging, see detailed configuration MemoryLogTimerConfig
enable shutdown hook logger to trace shutdown cause
Define state listeners to be registered for receiving events of the execution of SmartDataLake job
Define UDFs to be registered in spark session. The registered UDFs are available in Spark SQL transformations and expression evaluation, e.g. configuration of ExecutionModes.
Define UDFs in python to be registered in spark session. The registered UDFs are available in Spark SQL transformations but not for expression evaluation.
Define SecretProvider's to be registered.
Configure a list of exceptions for partitioned DataObject id's, which are allowed to overwrite the all partitions of a table if no partition values are set. This is used to override/avoid a protective error when using SDLSaveMode.OverwriteOptimized|OverwritePreserveDirectories. Define it as a list of DataObject id's.
Number of Executions to keep runtime data for in streaming mode (default = 10). Must be bigger than 1.
Trigger interval for synchronous actions in streaming mode in seconds (default = 60 seconds) The synchronous actions of the DAG will be executed with this interval if possile. Note that for asynchronous actions there are separate settings, e.g. SparkStreamingMode.triggerInterval.
Configuration for periodic memory usage logging
Configuration for periodic memory usage logging
interval in seconds between memory usage logs
enable logging linux memory
enable logging details about linux cgroup memory
enable logging details about different jvm buffers
Hooks for modules to interact with sdl-core
SDL Plugin defines an interface to execute custom code on SDL startup and shutdown.
SDL Plugin defines an interface to execute custom code on SDL startup and shutdown. Configure it by setting a java system property "sdl.pluginClassName" to a class name implementing SDLPlugin interface. The class needs to have a constructor without any parameters.
Abstract Smart Data Lake Command Line Application.
This case class represents a default configuration for the App.
This case class represents a default configuration for the App. It is populated by parsing command-line arguments. It also specifies default values.
Expressions to select the actions to execute. See AppUtil.filterActionList() or commandline help for syntax description.
Application name.
One or multiple configuration files or directories containing configuration files, separated by comma.
The Spark master URL passed to SparkContext when in local mode.
The Spark deploy mode passed to SparkContext when in local mode.
Kerberos user name (username
@kerberosDomain
) for local mode.
Kerberos domain (username
@kerberosDomain
) for local mode.
Path to Kerberos keytab file for local mode.
Run in test mode:
Interface to notify interested parties about action results & metric
Configuration to notify interested parties about action results & metric
Configuration to notify interested parties about action results & metric
fully qualified class name of class implementing StateListener interface. The class needs a constructor with one parameter options: Map[String,String]
.
Options are passed to StateListener constructor.
Databricks Smart Data Lake Command Line Application.
Databricks Smart Data Lake Command Line Application.
As there is an old version of config-*.jar deployed on Databricks, this special App uses a ChildFirstClassLoader to override it in the classpath.
Smart Data Lake Builder application for local mode.
Smart Data Lake Builder application for local mode.
Sets master to local[*] and deployMode to client by default.
Default Smart Data Lake Command Line Application.
Implementation Note: This must be a class and not an object in order to be found by reflection in DatabricksSmartDataLakeBuilder