definitions

Type Members

sealed trait AuthMode extends AnyRef

Authentication modes define how an application authenticates itself to a given data object/connection
Authentication modes define how an application authenticates itself to a given data object/connection
You need to define one of the AuthModes (subclasses) as type, i.e.
```
authMode {
  type = BasicAuthMode
  user = myUser
  password = myPassword
}
```
case class BasicAuthMode(userVariable: String, passwordVariable: String) extends AuthMode with Product with Serializable

Derive options for various connection types to connect by basic authentication
case class Condition(expression: String, description: Option[String] = None) extends Product with Serializable

Definition of a Spark SQL condition with description.
Definition of a Spark SQL condition with description. This is used for example to define failConditions of PartitionDiffMode.
expression
Condition formulated as Spark SQL. The attributes available are dependent on the context.
description
A textual description of the condition to be shown in error messages.
case class CustomPartitionMode(className: String, alternativeOutputId: Option[DataObjectId] = None, options: Map[String, String] = Map()) extends ExecutionMode with ExecutionModeWithMainInputOutput with Product with Serializable

Execution mode to create custom partition execution mode logic.
Execution mode to create custom partition execution mode logic. Define a function which receives main input&output DataObject and returns partition values to process as Seq[Map[String,String]\]
className
class name implementing trait CustomPartitionModeLogic
alternativeOutputId
optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.
options
Options specified in the configuration for this execution mode
trait CustomPartitionModeLogic extends AnyRef
case class DefaultExecutionModeExpressionData(feed: String, application: String, runId: Int, attemptId: Int, referenceTimestamp: Option[Timestamp], runStartTime: Timestamp, attemptStartTime: Timestamp, givenPartitionValues: Seq[Map[String, String]], isStartNode: Boolean) extends Product with Serializable

Attributes definition for spark expressions used as ExecutionMode conditions.
Attributes definition for spark expressions used as ExecutionMode conditions.
givenPartitionValues
Partition values specified with command line (start action) or passed from previous action
isStartNode
True if the current action is a start node of the DAG.
sealed trait ExecutionMode extends SmartDataLakeLogger

Execution mode defines how data is selected when running a data pipeline.
Execution mode defines how data is selected when running a data pipeline. You need to select one of the subclasses by defining type, i.e.
```
executionMode = {
  type = SparkIncrementalMode
  compareCol = "id"
}
```
case class FailIfNoPartitionValuesMode() extends ExecutionMode with Product with Serializable

An execution mode which just validates that partition values are given.
An execution mode which just validates that partition values are given. Note: For start nodes of the DAG partition values can be defined by command line, for subsequent nodes partition values are passed on from previous nodes.
case class PartitionDiffMode(partitionColNb: Option[Int] = None, alternativeOutputId: Option[DataObjectId] = None, nbOfPartitionValuesPerRun: Option[Int] = None, applyCondition: Option[String] = None, failCondition: Option[String] = None, failConditions: Seq[Condition] = Seq(), stopIfNoData: Boolean = true, selectExpression: Option[String] = None, applyPartitionValuesTransform: Boolean = false, selectAdditionalInputExpression: Option[String] = None) extends ExecutionMode with ExecutionModeWithMainInputOutput with Product with Serializable

Partition difference execution mode lists partitions on mainInput & mainOutput DataObject and starts loading all missing partitions.
Partition difference execution mode lists partitions on mainInput & mainOutput DataObject and starts loading all missing partitions. Partition columns to be used for comparision need to be a common 'init' of input and output partition columns. This mode needs mainInput/Output DataObjects which CanHandlePartitions to list partitions. Partition values are passed to following actions for partition columns which they have in common.
partitionColNb
optional number of partition columns to use as a common 'init'.
alternativeOutputId
optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.
nbOfPartitionValuesPerRun
optional restriction of the number of partition values per run.
applyCondition
Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.
failConditions
List of conditions to fail application of execution mode if true. Define as spark sql expressions working with attributes of PartitionDiffModeExpressionData returning a boolean. Default is that the application of the PartitionDiffMode does not fail the action. If there is no data to process, the following actions are skipped. Multiple conditions are evaluated individually and every condition may fail the execution mode (or-logic)
stopIfNoData
optional setting if further actions should be skipped if this action has no data to process (default). Set stopIfNoData=false if you want to run further actions nevertheless. They will receive output dataObject unfiltered as input.
selectExpression
optional expression to define or refine the list of selected output partitions. Define a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected output partitions found in attribute selectedPartitionValues.
applyPartitionValuesTransform
If true applies the partition values transform of custom transformations on input partition values before comparison with output partition values. If enabled input and output partition columns can be different. Default is to disable the transformation of partition values.
selectAdditionalInputExpression
optional expression to refine the list of selected input partitions. Note that primarily output partitions are selected by PartitionDiffMode. The selected output partitions are then transformed back to the input partitions needed to create the selected output partitions. This is one-to-one except if applyPartitionValuesTransform=true. And sometimes there is a need for additional input data to create the output partitions, e.g. if you aggregate a window of 7 days for every day. You can customize selected input partitions by defining a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected input partitions found in attribute selectedInputPartitionValues.
case class PartitionDiffModeExpressionData(feed: String, application: String, runId: Int, attemptId: Int, referenceTimestamp: Option[Timestamp], runStartTime: Timestamp, attemptStartTime: Timestamp, givenPartitionValues: Seq[Map[String, String]], inputPartitionValues: Seq[Map[String, String]], outputPartitionValues: Seq[Map[String, String]], selectedPartitionValues: Seq[Map[String, String]], selectedInputPartitionValues: Seq[Map[String, String]]) extends Product with Serializable
case class PublicKeyAuthMode(userVariable: String) extends AuthMode with Product with Serializable

Validate by user and private/public key Private key is read from .ssh
case class SSLCertsAuthMode(keystorePath: String, keystoreType: Option[String], keystorePassVariable: String, truststorePath: String, truststoreType: Option[String], truststorePassVariable: String) extends AuthMode with Product with Serializable

Validate by SSL Certificates : Only location an credentials.
Validate by SSL Certificates : Only location an credentials. Additional attributes should be supplied via options map
case class SparkIncrementalMode(compareCol: String, alternativeOutputId: Option[DataObjectId] = None, stopIfNoData: Boolean = true, applyCondition: Option[Condition] = None) extends ExecutionMode with ExecutionModeWithMainInputOutput with Product with Serializable

Compares max entry in "compare column" between mainOutput and mainInput and incrementally loads the delta.
Compares max entry in "compare column" between mainOutput and mainInput and incrementally loads the delta. This mode works only with SparkSubFeeds. The filter is not propagated to following actions.
compareCol
a comparable column name existing in mainInput and mainOutput used to identify the delta. Column content should be bigger for newer records.
alternativeOutputId
optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.
stopIfNoData
optional setting if further actions should be skipped if this action has no data to process (default). Set stopIfNoData=false if you want to run further actions nevertheless. They will receive output dataObject unfiltered as input.
applyCondition
Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.
case class SparkStreamingOnceMode(checkpointLocation: String, inputOptions: Map[String, String] = Map(), outputOptions: Map[String, String] = Map(), outputMode: OutputMode = OutputMode.Append) extends ExecutionMode with Product with Serializable

Spark streaming execution mode uses Spark Structured Streaming to incrementally execute data loads (trigger=Trigger.Once) and keep track of processed data.
Spark streaming execution mode uses Spark Structured Streaming to incrementally execute data loads (trigger=Trigger.Once) and keep track of processed data. This mode needs a DataObject implementing CanCreateStreamingDataFrame and works only with SparkSubFeeds.
checkpointLocation
location for checkpoints of streaming query to keep state
inputOptions
additional option to apply when reading streaming source. This overwrites options set by the DataObjects.
outputOptions
additional option to apply when writing to streaming sink. This overwrites options set by the DataObjects.
case class TokenAuthMode(tokenVariable: String) extends AuthMode with Product with Serializable

Derive options for various connection types to connect by token

Value Members

object DateColumnType extends Enumeration

Datatype for date columns in Hive
object Environment

Environment dependent configurations.
Environment dependent configurations. They can be set - by Java system properties (prefixed with "sdl.", e.g. "sdl.hadoopAuthoritiesWithAclsRequired") - by Environment variables (prefixed with "SDL_" and camelCase converted to uppercase, e.g. "SDL_HADOOP_AUTHORITIES_WITH_ACLS_REQUIRED") - by a custom io.smartdatalake.app.SmartDataLakeBuilder implementation for your environment, which sets these variables directly.
object HiveConventions

Hive conventions
object HiveTableLocationSuffix extends Enumeration

Suffix used for alternating parquet HDFS paths (usually in TickTockHiveTableDataObject for integration layer)
object OutputType extends Enumeration

Options for HDFS output
object TechnicalTableColumn extends Enumeration

Column names specific to historization of Hive tables

package definitions

Type Members

sealed trait AuthMode extends AnyRef

case class BasicAuthMode(userVariable: String, passwordVariable: String) extends AuthMode with Product with Serializable

case class Condition(expression: String, description: Option[String] = None) extends Product with Serializable

case class CustomPartitionMode(className: String, alternativeOutputId: Option[DataObjectId] = None, options: Map[String, String] = Map()) extends ExecutionMode with ExecutionModeWithMainInputOutput with Product with Serializable

trait CustomPartitionModeLogic extends AnyRef

sealed trait ExecutionMode extends SmartDataLakeLogger

case class FailIfNoPartitionValuesMode() extends ExecutionMode with Product with Serializable

case class PublicKeyAuthMode(userVariable: String) extends AuthMode with Product with Serializable

case class SSLCertsAuthMode(keystorePath: String, keystoreType: Option[String], keystorePassVariable: String, truststorePath: String, truststoreType: Option[String], truststorePassVariable: String) extends AuthMode with Product with Serializable

case class SparkIncrementalMode(compareCol: String, alternativeOutputId: Option[DataObjectId] = None, stopIfNoData: Boolean = true, applyCondition: Option[Condition] = None) extends ExecutionMode with ExecutionModeWithMainInputOutput with Product with Serializable

case class SparkStreamingOnceMode(checkpointLocation: String, inputOptions: Map[String, String] = Map(), outputOptions: Map[String, String] = Map(), outputMode: OutputMode = OutputMode.Append) extends ExecutionMode with Product with Serializable

case class TokenAuthMode(tokenVariable: String) extends AuthMode with Product with Serializable

Value Members

object DateColumnType extends Enumeration

object Environment

object HiveConventions

object HiveTableLocationSuffix extends Enumeration

object OutputType extends Enumeration

object TechnicalTableColumn extends Enumeration

Ungrouped