Interface to define custom logic for DataFrame creation
Configuration of a custom Spark-DataFrame creator as part of CustomDfDataObject Define a exec function which receives a map of options and returns a DataFrame to be used as input.
Interface to define a custom Spark-DataFrame transformation (1:1)
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) Define a transform function which receives a DataObjectIds, a DataFrames and a map of options and has to return a DataFrame, see also CustomDfTransformer.
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) Define a transform function which receives a DataObjectIds, a DataFrames and a map of options and has to return a DataFrame, see also CustomDfTransformer.
Note about Python transformation: Environment with Python and PySpark needed.
PySpark session is initialize and available under variables sc
, session
, sqlContext
.
Other variables available are
- inputDf
: Input DataFrame
- options
: Transformation options as Map[String,String]
- dataObjectId
: Id of input dataObject as String
Output DataFrame must be set with setOutputDf(df)
.
Optional class name implementing trait CustomDfTransformer
Optional file where scala code for transformation is loaded from. The scala code in the file needs to be a function of type fnTransformType.
Optional scala code for transformation. The scala code needs to be a function of type fnTransformType.
Optional SQL code for transformation. Use tokens %{<key>} to replace with runtimeOptions in SQL code. Example: "select * from test where run = %{runId}"
Optional pythonFile to use for python transformation. The python code can use variables inputDf, dataObjectId and options. The transformed DataFrame has to be set with setOutputDf.
Optional pythonCode to user for python transformation. The python code can use variables inputDf, dataObjectId and options. The transformed DataFrame has to be set with setOutputDf.
Options to pass to the transformation
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
Interface to define a custom Spark-DataFrame transformation (n:m) Same trait as CustomDfTransformer, but multiple input and outputs supported.
Configuration of a custom Spark-DataFrame transformation between several inputs and outputs (n:m).
Configuration of a custom Spark-DataFrame transformation between several inputs and outputs (n:m). Define a transform function which receives a map of input DataObjectIds with DataFrames and a map of options and as to return a map of output DataObjectIds with DataFrames, see also trait CustomDfsTransformer.
Optional class name implementing trait CustomDfsTransformer
Optional file where scala code for transformation is loaded from. The scala code in the file needs to be a function of type fnTransformType.
Optional scala code for transformation. The scala code needs to be a function of type fnTransformType.
Optional map of DataObjectId and corresponding SQL Code. Use tokens %{<key>} to replace with runtimeOptions in SQL code. Example: "select * from test where run = %{runId}"
Options to pass to the transformation
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
Interface to define custom file transformation for CustomFileAction
Configuration of custom file transformation between one input and one output (1:1)
Configuration of custom file transformation between one input and one output (1:1)
Optional class name to load transformer code from
Optional file where scala code for transformation is loaded from
Optional scala code for transformation
Options to pass to the transformation
Interface to create a UserDefinedFunction object to be registered as udf.
Configuration to register a UserDefinedFunction in the spark session of SmartDataLake.
Configuration to register a UserDefinedFunction in the spark session of SmartDataLake.
fully qualified class name of class implementing SparkUDFCreator interface. The class needs a constructor without parameters.
Options are passed to SparkUDFCreator apply method.
Configuration of a custom Spark-DataFrame creator as part of CustomDfDataObject Define a exec function which receives a map of options and returns a DataFrame to be used as input. Optionally define a schema function to return a StructType used as schema in init-phase. See also trait CustomDfCreator.
Note that for now implementing CustomDfCreator.schema method is only possible with className configuration attribute.
Optional class name implementing trait CustomDfCreator
Optional file where scala code for creator is loaded from. The scala code in the file needs to be a function of type fnExecType.
Optional scala code for creator. The scala code needs to be a function of type fnExecType.
Options to pass to the creator