io.smartdatalake.workflow.dataobject
unique name of this data object
DDL-statement to be executed in prepare phase, using output jdbc connection
SQL-statement to be executed in exec phase before reading input table, using input jdbc connection. Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
SQL-statement to be executed in exec phase after reading input table and before action is finished, using input jdbc connection Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
SQL-statement to be executed in exec phase before writing output table, using output jdbc connection Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
SQL-statement to be executed in exec phase after writing output table, using output jdbc connection Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
An optional, minimal schema that this DataObject must have to pass schema validation on reading and writing.
The jdbc table to be read
Number of rows to be fetched together by the Jdbc driver
Id of JdbcConnection configuration
Any jdbc options according to https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html. Note that some options above set and override some of this options explicitly.
Id of JdbcConnection configuration
Creates the read schema based on a given write schema.
Creates the read schema based on a given write schema. Normally this is the same, but some DataObjects can remove & add columns on read (e.g. KafkaTopicDataObject, SparkFileDataObject) In this cases we have to break the DataFrame lineage und create a dummy DataFrame in init phase.
DDL-statement to be executed in prepare phase, using output jdbc connection
Returns the factory that can parse this type (that is, type CO
).
Returns the factory that can parse this type (that is, type CO
).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
the factory (object) for this class.
Handle class cast exception when getting objects from instance registry
Handle class cast exception when getting objects from instance registry
unique name of this data object
unique name of this data object
Called during init phase for checks and initialization.
Called during init phase for checks and initialization. If possible dont change the system until execution phase.
Number of rows to be fetched together by the Jdbc driver
Any jdbc options according to https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html.
Any jdbc options according to https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html. Note that some options above set and override some of this options explicitly.
Additional metadata for the DataObject
Additional metadata for the DataObject
Runs operations after reading from DataObject
Runs operations after reading from DataObject
SQL-statement to be executed in exec phase after reading input table and before action is finished, using input jdbc connection Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
Runs operations after writing to DataObject
Runs operations after writing to DataObject
SQL-statement to be executed in exec phase after writing output table, using output jdbc connection Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
Runs operations before reading from DataObject
Runs operations before reading from DataObject
SQL-statement to be executed in exec phase before reading input table, using input jdbc connection.
SQL-statement to be executed in exec phase before reading input table, using input jdbc connection. Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
Runs operations before writing to DataObject Note: As the transformed SubFeed doesnt yet exist in Action.preWrite, no partition values can be passed as parameters as in preRead
Runs operations before writing to DataObject Note: As the transformed SubFeed doesnt yet exist in Action.preWrite, no partition values can be passed as parameters as in preRead
SQL-statement to be executed in exec phase before writing output table, using output jdbc connection Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
Prepare & test DataObject's prerequisits
Prepare & test DataObject's prerequisits
This runs during the "prepare" operation of the DAG.
An optional, minimal schema that this DataObject must have to pass schema validation on reading and writing.
An optional, minimal schema that this DataObject must have to pass schema validation on reading and writing.
The jdbc table to be read
The jdbc table to be read
Validate the schema of a given Spark Data Frame df
against schemaMin
.
Validate the schema of a given Spark Data Frame df
against schemaMin
.
The data frame to validate.
SchemaViolationException
is the schemaMin
does not validate.
Write Spark structured streaming DataFrame The default implementation uses foreachBatch and this traits writeDataFrame method to write the DataFrame.
Write Spark structured streaming DataFrame The default implementation uses foreachBatch and this traits writeDataFrame method to write the DataFrame. Some DataObjects will override this with specific implementations (Kafka).
The Streaming DataFrame to write
Trigger frequency for stream
location for checkpoints of streaming query
DataObject of type JDBC. Provides details for an action to access tables in a database through JDBC.
unique name of this data object
DDL-statement to be executed in prepare phase, using output jdbc connection
SQL-statement to be executed in exec phase before reading input table, using input jdbc connection. Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
SQL-statement to be executed in exec phase after reading input table and before action is finished, using input jdbc connection Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
SQL-statement to be executed in exec phase before writing output table, using output jdbc connection Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
SQL-statement to be executed in exec phase after writing output table, using output jdbc connection Use tokens with syntax %{<spark sql expression>} to substitute with values from DefaultExpressionData.
An optional, minimal schema that this DataObject must have to pass schema validation on reading and writing.
The jdbc table to be read
Number of rows to be fetched together by the Jdbc driver
Id of JdbcConnection configuration
Any jdbc options according to https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html. Note that some options above set and override some of this options explicitly.