NestedArrayTransformations

Type Members

type ExtendedTransformFunction = (Column, GetFieldFunction) ⇒ Column
type GetFieldFunction = (String) ⇒ Column
type TransformFunction = (Column) ⇒ Column

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def addColumnAfter(df: DataFrame, afterColumn: String, columnName: String, expr: Column): DataFrame

Adds a column similar to df.withColumn(), but you can specify the position of the new column by specifying a column name after which to add the new column
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def gatherErrors(df: DataFrame, nestedErrorColumn: String, globalErrorColumn: String): DataFrame

Gathers errors from a nested error column into a global error column for the dataframe
Gathers errors from a nested error column into a global error column for the dataframe
df
A dataframe containing error columns.
nestedErrorColumn
A column name that can be nested deeply inside the dataframe.
globalErrorColumn
An error column name at the root shema level. This column should be at the root level. It will be created automatically if it does not exist.
returns
A dataframe with a new field that contains the list of errors.
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def nestedAddColumn(df: DataFrame, newColumnName: String, expression: Column): DataFrame

Add a column that can be inside nested structs, arrays and its combinations
Add a column that can be inside nested structs, arrays and its combinations
df
Dataframe to be transformed
newColumnName
A column name to be created
expression
A new column value
returns
A dataframe with a new field that contains transformed values.
def nestedAddColumnExtended(df: DataFrame, newColumnName: String, expression: ExtendedTransformFunction): DataFrame

Add a column that can be inside nested structs, arrays and its combinations
Add a column that can be inside nested structs, arrays and its combinations
df
Dataframe to be transformed
newColumnName
A column name to be created
expression
A function that that takes a 'getField()' function and returns a column as a Spark expression.
returns
A dataframe with a new field that contains transformed values.
def nestedDropColumn(df: DataFrame, columnToDrop: String): DataFrame

Drop a column from inside a nested structs, arrays and its combinations
Drop a column from inside a nested structs, arrays and its combinations
df
Dataframe to be transformed
columnToDrop
A column name to be dropped
returns
A dataframe with a new field that contains transformed values.
def nestedExtendedStructAndErrorMap(df: DataFrame, inputStructField: String, outputChildField: String, errorColumnName: String, expression: ExtendedTransformFunction, errorCondition: ExtendedTransformFunction): DataFrame

A nested struct map with error column support.
A nested struct map with error column support. Given a struct field the method will create a new child field of that struct as a transformation of struct fields and will update the error column according to a specified transformation. This is useful for transformations that require combining several fields of a struct in an array. Extended transformation functions are used so that the caller can access any field in the array path.
Here is an example demonstrating how to handle both root and nested cases:
```
val dfOut = nestedStructAndErrorMap(df, columnPath, "people.addresses.combinedField", (_, getField) => {
  // Struct transformation
  concat(getField("id"), getField("people.addresses.city"), getField("people.first_name"))
}
}, (_, getField) => {
  // Error column transformation
  if (isError(getField("people.addresses.city") ErrorCaseClsss("Some error") else null
}
})
```
df
An input DataFrame
inputStructField
A struct column name for which to apply the transformation
outputChildField
The output column name that will be added as a child of the source struct.
errorColumnName
The name of the error column.
expression
A function that applies a transformation to a column as a Spark expression
errorCondition
A function that should check error conditions and return an error column in case such conditions are met
returns
A dataframe with a new field that contains transformed values.
def nestedExtendedStructMap(df: DataFrame, inputStructField: String, outputChildField: String, expression: ExtendedTransformFunction): DataFrame

A nested struct map.
A nested struct map. Given a struct field the method will create a new child field of that struct as a transformation of struct fields. This is useful for transformations such as concatenation of fields. The method uses extended transformation functions so the caller can access all parent fields as well.
Here is an example demonstrating how to handle both root and nested cases:
```
val dfOut = nestedStructMap(df, columnPath, "people.combinedField", (_, getField) => {
  // A root lelev field 'id' is concatenated with the full name field of an array of people.
  concat(getField("id"), lit(" "), getField("people.full_name"))
}
})
```
df
An input DataFrame
inputStructField
A struct column name for which to apply the transformation
outputChildField
The output column name that will be added as a child of the input struct.
expression
A function that applies a transformation to a column as a Spark expression
returns
A dataframe with a new field that contains transformed values.
def nestedExtendedWithColumnAndErrorMap(df: DataFrame, inputColumnName: String, outputColumnName: String, errorColumnName: String, expression: ExtendedTransformFunction, errorCondition: ExtendedTransformFunction): DataFrame

A nested map that also appends errors to the error column and uses an extended transformation function that provides the ability to use fields in parent level of nesting.
A nested map that also appends errors to the error column and uses an extended transformation function that provides the ability to use fields in parent level of nesting. (see NestedArrayTransformations.nestedWithColumnMap above for the usage)
df
Dataframe to be transformed
inputColumnName
A column name for which to apply the transformation, e.g. company.employee.firstName.
outputColumnName
The output column name. The path is optional, e.g. you can use conformedName instead of company.employee.conformedName.
errorColumnName
The name of the error column.
expression
A function that applies a transformation to a column as a Spark expression.
errorCondition
A function that takes an input column and returns an expression for an error column.
returns
A dataframe with a new field that contains transformed values.
def nestedStructAndErrorMap(df: DataFrame, inputStructField: String, outputChildField: String, errorColumnName: String, expression: TransformFunction, errorCondition: TransformFunction): DataFrame

A nested struct map with error column support.
A nested struct map with error column support. Given a struct field the method will create a new child field of that struct as a transformation of struct fields and will update the error column according to a specified transformation. This is useful for transformations that require combining several fields of a struct in an array.
To use root of the schema as the input struct pass "" as the inputStructField. In this case null will be passed to the lambda function.
Here is an example demonstrating how to handle both root and nested cases:
```
val dfOut = nestedStructAndErrorMap(df, columnPath, "combinedField", c => {
// Struct transformation
if (c==null) {
  // The columns are at the root level
  concat(col("city"), col("street"))
} else {
  // The columns are inside nested structs/arrays
  concat(c.getField("city"), c.getField("street"))
}
}, c => {
// Error column transformation
if (c==null) {
  // The columns are at the root level
  if (isError(col("city")) ErrorCaseClsss("Some error") else null
} else {
  // The columns are inside nested structs/arrays
  if (isError(c.getField("city")) ErrorCaseClsss("Some error") else null
}
})
```
df
An input DataFrame
inputStructField
A struct column name for which to apply the transformation
outputChildField
The output column name that will be added as a child of the source struct.
errorColumnName
The name of the error column.
expression
A function that applies a transformation to a column as a Spark expression
errorCondition
A function that should check error conditions and return an error column in case such conditions are met
returns
A dataframe with a new field that contains transformed values.
def nestedStructMap(df: DataFrame, inputStructField: String, outputChildField: String, expression: TransformFunction): DataFrame

A nested struct map.
A nested struct map. Given a struct field the method will create a new child field of that struct as a transformation of struct fields. This is useful for transformations such as concatenation of fields.
To use root of the schema as the input struct pass "" as the inputStructField. In this case null will be passed to the lambda function.
Here is an example demonstrating how to handle both root and nested cases:
```
val dfOut = nestedStructMap(df, columnPath, "combinedField", c => {
if (c==null) {
  // The columns are at the root level
  concat(col("city"), col("street"))
} else {
  // The columns are inside nested structs/arrays
  concat(c.getField("city"), c.getField("street"))
}
})
```
df
An input DataFrame
inputStructField
A struct column name for which to apply the transformation
outputChildField
The output column name that will be added as a child of the source struct.
expression
A function that applies a transformation to a column as a Spark expression
returns
A dataframe with a new field that contains transformed values.
def nestedUnstruct(df: DataFrame, columnToUnstruct: String): DataFrame

Moves all fields of the specified struct up one level.
Moves all fields of the specified struct up one level. This can only be envoked on a struct inside other struct
```
  root
   |-- a: struct
   |    |-- b: struct
   |    |    |-- c: string
   |    |    |-- d: string

df.nestedUnstruct("a.b")

  root
   |-- a: struct
   |    |-- c: string
   |    |-- d: string
```
columnToUnstruct
A struct column name that contains the fields to extract.
returns
A dataframe with the struct removed and its fields are up one level.
def nestedWithColumnAndErrorMap(df: DataFrame, inputColumnName: String, outputColumnName: String, errorColumnName: String, expression: TransformFunction, errorCondition: TransformFunction): DataFrame

A nested map that also appends errors to the error column (see NestedArrayTransformations.nestedWithColumnMap above)
A nested map that also appends errors to the error column (see NestedArrayTransformations.nestedWithColumnMap above)
df
Dataframe to be transformed
inputColumnName
A column name for which to apply the transformation, e.g. company.employee.firstName.
outputColumnName
The output column name. The path is optional, e.g. you can use conformedName instead of company.employee.conformedName.
errorColumnName
The name of the error column.
expression
A function that applies a transformation to a column as a Spark expression.
errorCondition
A function that takes an input column and returns an expression for an error column.
returns
A dataframe with a new field that contains transformed values.
def nestedWithColumnMap(df: DataFrame, inputColumnName: String, outputColumnName: String, expression: TransformFunction): DataFrame

Map transformation for columns that can be inside nested structs, arrays and its combinations.
Map transformation for columns that can be inside nested structs, arrays and its combinations.
If the input column is a primitive field the method will add outputColumnName at the same level of nesting by executing the expression passing the source column into it. If a struct column is expected you can use .getField(...) method to operate on its children.
The output column name can omit the full path as the field will be created at the same level of nesting as the input column.
df
Dataframe to be transformed
inputColumnName
A column name for which to apply the transformation, e.g. company.employee.firstName.
outputColumnName
The output column name. The path is optional, e.g. you can use conformedName instead of company.employee.conformedName.
expression
A function that applies a transformation to a column as a Spark expression.
returns
A dataframe with a new field that contains transformed values.
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package transformations

object NestedArrayTransformations

Type Members

type ExtendedTransformFunction = (Column, GetFieldFunction) ⇒ Column

type GetFieldFunction = (String) ⇒ Column

type TransformFunction = (Column) ⇒ Column

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def addColumnAfter(df: DataFrame, afterColumn: String, columnName: String, expr: Column): DataFrame

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def gatherErrors(df: DataFrame, nestedErrorColumn: String, globalErrorColumn: String): DataFrame

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

def nestedAddColumn(df: DataFrame, newColumnName: String, expression: Column): DataFrame

def nestedAddColumnExtended(df: DataFrame, newColumnName: String, expression: ExtendedTransformFunction): DataFrame

def nestedDropColumn(df: DataFrame, columnToDrop: String): DataFrame

def nestedExtendedStructAndErrorMap(df: DataFrame, inputStructField: String, outputChildField: String, errorColumnName: String, expression: ExtendedTransformFunction, errorCondition: ExtendedTransformFunction): DataFrame

def nestedExtendedStructMap(df: DataFrame, inputStructField: String, outputChildField: String, expression: ExtendedTransformFunction): DataFrame

def nestedExtendedWithColumnAndErrorMap(df: DataFrame, inputColumnName: String, outputColumnName: String, errorColumnName: String, expression: ExtendedTransformFunction, errorCondition: ExtendedTransformFunction): DataFrame

def nestedStructAndErrorMap(df: DataFrame, inputStructField: String, outputChildField: String, errorColumnName: String, expression: TransformFunction, errorCondition: TransformFunction): DataFrame

def nestedStructMap(df: DataFrame, inputStructField: String, outputChildField: String, expression: TransformFunction): DataFrame

def nestedUnstruct(df: DataFrame, columnToUnstruct: String): DataFrame

def nestedWithColumnAndErrorMap(df: DataFrame, inputColumnName: String, outputColumnName: String, errorColumnName: String, expression: TransformFunction, errorCondition: TransformFunction): DataFrame

def nestedWithColumnMap(df: DataFrame, inputColumnName: String, outputColumnName: String, expression: TransformFunction): DataFrame

final def notify(): Unit

final def notifyAll(): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped