is a Dataframe used in a comparison as the origin of truth
is a Dataframe that is being tested
is a set of primary keys of the dataset. This highly increases accuracy of the output as we are then able to pinpoint the differences
Config object holding project based configurable parameters. Difference to the cliOptions is that these are meant to stay the same for the project, while cliOptions change for each test
Optional schema to cherry-pick columns form the two dataframes to compare. For example, if you have a timestamp column that will never be the same, you provide a schema without that timestamp and it will not be compared.
Implicit spark session.
Performs a check if the schemas of two data frames are supersets of schema provided..
Performs a check if the schemas of two data frames are supersets of schema provided..
Comparison pair of two DataFrames whose schema will be tested
Schema that needs to be a subset of schemas provided by data sets
Runs the comparison and returns the ComparisonResult object with all the data needed about the final state of the comparison.
Runs the comparison and returns the ComparisonResult object with all the data needed about the final state of the comparison.
ComparisonObject with final state of the comparison ran.
Class that is the brain of the DatasetComparison module. This class should be used in case of using DatasetComparison as a library. In case of running the DatasetComparison as SparkJob, please use the DatasetComparisonJob.