Class

za.co.absa.hermes.datasetComparison

DatasetComparator

Related Doc: package datasetComparison

Permalink

class DatasetComparator extends AnyRef

Class that is the brain of the DatasetComparison module. This class should be used in case of using DatasetComparison as a library. In case of running the DatasetComparison as SparkJob, please use the DatasetComparisonJob.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DatasetComparator
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DatasetComparator(dataFrameReference: DataFrame, dataFrameActual: DataFrame, keys: Set[String] = Set.empty[String], config: DatasetComparisonConfig = new TypesafeConfig(None), optionalSchema: Option[StructType] = None)(implicit sparkSession: SparkSession)

    Permalink

    dataFrameReference

    is a Dataframe used in a comparison as the origin of truth

    dataFrameActual

    is a Dataframe that is being tested

    keys

    is a set of primary keys of the dataset. This highly increases accuracy of the output as we are then able to pinpoint the differences

    config

    Config object holding project based configurable parameters. Difference to the cliOptions is that these are meant to stay the same for the project, while cliOptions change for each test

    optionalSchema

    Optional schema to cherry-pick columns form the two dataframes to compare. For example, if you have a timestamp column that will never be the same, you provide a schema without that timestamp and it will not be compared.

    sparkSession

    Implicit spark session.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def checkSchemas(testedDF: ComparisonPair[DataFrame], schema: StructType): Unit

    Permalink

    Performs a check if the schemas of two data frames are supersets of schema provided..

    Performs a check if the schemas of two data frames are supersets of schema provided..

    testedDF

    Comparison pair of two DataFrames whose schema will be tested

    schema

    Schema that needs to be a subset of schemas provided by data sets

  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def compare: ComparisonResult

    Permalink

    Runs the comparison and returns the ComparisonResult object with all the data needed about the final state of the comparison.

    Runs the comparison and returns the ComparisonResult object with all the data needed about the final state of the comparison.

    returns

    ComparisonObject with final state of the comparison ran.

  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  17. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  18. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  19. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped