Object

za.co.absa.spark.hats.transformations

NestedArrayTransformations

Related Doc: package transformations

Permalink

object NestedArrayTransformations

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. NestedArrayTransformations
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. type ExtendedTransformFunction = (Column, GetFieldFunction) ⇒ Column

    Permalink
  2. type GetFieldFunction = (String) ⇒ Column

    Permalink
  3. type TransformFunction = (Column) ⇒ Column

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addColumnAfter(df: DataFrame, afterColumn: String, columnName: String, expr: Column): DataFrame

    Permalink

    Adds a column similar to df.withColumn(), but you can specify the position of the new column by specifying a column name after which to add the new column

  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. def gatherErrors(df: DataFrame, nestedErrorColumn: String, globalErrorColumn: String): DataFrame

    Permalink

    Gathers errors from a nested error column into a global error column for the dataframe

    Gathers errors from a nested error column into a global error column for the dataframe

    df

    A dataframe containing error columns.

    nestedErrorColumn

    A column name that can be nested deeply inside the dataframe.

    globalErrorColumn

    An error column name at the root shema level. This column should be at the root level. It will be created automatically if it does not exist.

    returns

    A dataframe with a new field that contains the list of errors.

  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. def nestedAddColumn(df: DataFrame, newColumnName: String, expression: Column): DataFrame

    Permalink

    Add a column that can be inside nested structs, arrays and its combinations

    Add a column that can be inside nested structs, arrays and its combinations

    df

    Dataframe to be transformed

    newColumnName

    A column name to be created

    expression

    A new column value

    returns

    A dataframe with a new field that contains transformed values.

  16. def nestedAddColumnExtended(df: DataFrame, newColumnName: String, expression: ExtendedTransformFunction): DataFrame

    Permalink

    Add a column that can be inside nested structs, arrays and its combinations

    Add a column that can be inside nested structs, arrays and its combinations

    df

    Dataframe to be transformed

    newColumnName

    A column name to be created

    expression

    A function that that takes a 'getField()' function and returns a column as a Spark expression.

    returns

    A dataframe with a new field that contains transformed values.

  17. def nestedDropColumn(df: DataFrame, columnToDrop: String): DataFrame

    Permalink

    Drop a column from inside a nested structs, arrays and its combinations

    Drop a column from inside a nested structs, arrays and its combinations

    df

    Dataframe to be transformed

    columnToDrop

    A column name to be dropped

    returns

    A dataframe with a new field that contains transformed values.

  18. def nestedExtendedStructAndErrorMap(df: DataFrame, inputStructField: String, outputChildField: String, errorColumnName: String, expression: ExtendedTransformFunction, errorCondition: ExtendedTransformFunction): DataFrame

    Permalink

    A nested struct map with error column support.

    A nested struct map with error column support. Given a struct field the method will create a new child field of that struct as a transformation of struct fields and will update the error column according to a specified transformation. This is useful for transformations that require combining several fields of a struct in an array. Extended transformation functions are used so that the caller can access any field in the array path.

    Here is an example demonstrating how to handle both root and nested cases:

    val dfOut = nestedStructAndErrorMap(df, columnPath, "people.addresses.combinedField", (_, getField) => {
      // Struct transformation
      concat(getField("id"), getField("people.addresses.city"), getField("people.first_name"))
    }
    }, (_, getField) => {
      // Error column transformation
      if (isError(getField("people.addresses.city") ErrorCaseClsss("Some error") else null
    }
    })
    df

    An input DataFrame

    inputStructField

    A struct column name for which to apply the transformation

    outputChildField

    The output column name that will be added as a child of the source struct.

    errorColumnName

    The name of the error column.

    expression

    A function that applies a transformation to a column as a Spark expression

    errorCondition

    A function that should check error conditions and return an error column in case such conditions are met

    returns

    A dataframe with a new field that contains transformed values.

  19. def nestedExtendedStructMap(df: DataFrame, inputStructField: String, outputChildField: String, expression: ExtendedTransformFunction): DataFrame

    Permalink

    A nested struct map.

    A nested struct map. Given a struct field the method will create a new child field of that struct as a transformation of struct fields. This is useful for transformations such as concatenation of fields. The method uses extended transformation functions so the caller can access all parent fields as well.

    Here is an example demonstrating how to handle both root and nested cases:

    val dfOut = nestedStructMap(df, columnPath, "people.combinedField", (_, getField) => {
      // A root lelev field 'id' is concatenated with the full name field of an array of people.
      concat(getField("id"), lit(" "), getField("people.full_name"))
    }
    })
    df

    An input DataFrame

    inputStructField

    A struct column name for which to apply the transformation

    outputChildField

    The output column name that will be added as a child of the input struct.

    expression

    A function that applies a transformation to a column as a Spark expression

    returns

    A dataframe with a new field that contains transformed values.

  20. def nestedExtendedWithColumnAndErrorMap(df: DataFrame, inputColumnName: String, outputColumnName: String, errorColumnName: String, expression: ExtendedTransformFunction, errorCondition: ExtendedTransformFunction): DataFrame

    Permalink

    A nested map that also appends errors to the error column and uses an extended transformation function that provides the ability to use fields in parent level of nesting.

    A nested map that also appends errors to the error column and uses an extended transformation function that provides the ability to use fields in parent level of nesting. (see NestedArrayTransformations.nestedWithColumnMap above for the usage)

    df

    Dataframe to be transformed

    inputColumnName

    A column name for which to apply the transformation, e.g. company.employee.firstName.

    outputColumnName

    The output column name. The path is optional, e.g. you can use conformedName instead of company.employee.conformedName.

    errorColumnName

    The name of the error column.

    expression

    A function that applies a transformation to a column as a Spark expression.

    errorCondition

    A function that takes an input column and returns an expression for an error column.

    returns

    A dataframe with a new field that contains transformed values.

  21. def nestedStructAndErrorMap(df: DataFrame, inputStructField: String, outputChildField: String, errorColumnName: String, expression: TransformFunction, errorCondition: TransformFunction): DataFrame

    Permalink

    A nested struct map with error column support.

    A nested struct map with error column support. Given a struct field the method will create a new child field of that struct as a transformation of struct fields and will update the error column according to a specified transformation. This is useful for transformations that require combining several fields of a struct in an array.

    To use root of the schema as the input struct pass "" as the inputStructField. In this case null will be passed to the lambda function.

    Here is an example demonstrating how to handle both root and nested cases:

    val dfOut = nestedStructAndErrorMap(df, columnPath, "combinedField", c => {
    // Struct transformation
    if (c==null) {
      // The columns are at the root level
      concat(col("city"), col("street"))
    } else {
      // The columns are inside nested structs/arrays
      concat(c.getField("city"), c.getField("street"))
    }
    }, c => {
    // Error column transformation
    if (c==null) {
      // The columns are at the root level
      if (isError(col("city")) ErrorCaseClsss("Some error") else null
    } else {
      // The columns are inside nested structs/arrays
      if (isError(c.getField("city")) ErrorCaseClsss("Some error") else null
    }
    })
    df

    An input DataFrame

    inputStructField

    A struct column name for which to apply the transformation

    outputChildField

    The output column name that will be added as a child of the source struct.

    errorColumnName

    The name of the error column.

    expression

    A function that applies a transformation to a column as a Spark expression

    errorCondition

    A function that should check error conditions and return an error column in case such conditions are met

    returns

    A dataframe with a new field that contains transformed values.

  22. def nestedStructMap(df: DataFrame, inputStructField: String, outputChildField: String, expression: TransformFunction): DataFrame

    Permalink

    A nested struct map.

    A nested struct map. Given a struct field the method will create a new child field of that struct as a transformation of struct fields. This is useful for transformations such as concatenation of fields.

    To use root of the schema as the input struct pass "" as the inputStructField. In this case null will be passed to the lambda function.

    Here is an example demonstrating how to handle both root and nested cases:

    val dfOut = nestedStructMap(df, columnPath, "combinedField", c => {
    if (c==null) {
      // The columns are at the root level
      concat(col("city"), col("street"))
    } else {
      // The columns are inside nested structs/arrays
      concat(c.getField("city"), c.getField("street"))
    }
    })
    df

    An input DataFrame

    inputStructField

    A struct column name for which to apply the transformation

    outputChildField

    The output column name that will be added as a child of the source struct.

    expression

    A function that applies a transformation to a column as a Spark expression

    returns

    A dataframe with a new field that contains transformed values.

  23. def nestedUnstruct(df: DataFrame, columnToUnstruct: String): DataFrame

    Permalink

    Moves all fields of the specified struct up one level.

    Moves all fields of the specified struct up one level. This can only be envoked on a struct inside other struct

      root
       |-- a: struct
       |    |-- b: struct
       |    |    |-- c: string
       |    |    |-- d: string
    
    df.nestedUnstruct("a.b")
    
      root
       |-- a: struct
       |    |-- c: string
       |    |-- d: string
    columnToUnstruct

    A struct column name that contains the fields to extract.

    returns

    A dataframe with the struct removed and its fields are up one level.

  24. def nestedWithColumnAndErrorMap(df: DataFrame, inputColumnName: String, outputColumnName: String, errorColumnName: String, expression: TransformFunction, errorCondition: TransformFunction): DataFrame

    Permalink

    A nested map that also appends errors to the error column (see NestedArrayTransformations.nestedWithColumnMap above)

    A nested map that also appends errors to the error column (see NestedArrayTransformations.nestedWithColumnMap above)

    df

    Dataframe to be transformed

    inputColumnName

    A column name for which to apply the transformation, e.g. company.employee.firstName.

    outputColumnName

    The output column name. The path is optional, e.g. you can use conformedName instead of company.employee.conformedName.

    errorColumnName

    The name of the error column.

    expression

    A function that applies a transformation to a column as a Spark expression.

    errorCondition

    A function that takes an input column and returns an expression for an error column.

    returns

    A dataframe with a new field that contains transformed values.

  25. def nestedWithColumnMap(df: DataFrame, inputColumnName: String, outputColumnName: String, expression: TransformFunction): DataFrame

    Permalink

    Map transformation for columns that can be inside nested structs, arrays and its combinations.

    Map transformation for columns that can be inside nested structs, arrays and its combinations.

    If the input column is a primitive field the method will add outputColumnName at the same level of nesting by executing the expression passing the source column into it. If a struct column is expected you can use .getField(...) method to operate on its children.

    The output column name can omit the full path as the field will be created at the same level of nesting as the input column.

    df

    Dataframe to be transformed

    inputColumnName

    A column name for which to apply the transformation, e.g. company.employee.firstName.

    outputColumnName

    The output column name. The path is optional, e.g. you can use conformedName instead of company.employee.conformedName.

    expression

    A function that applies a transformation to a column as a Spark expression.

    returns

    A dataframe with a new field that contains transformed values.

  26. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  28. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  29. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  30. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped