Package

com.github.timgent.sparkdataquality

checkssuite

Permalink

package checkssuite

Visibility
  1. Public
  2. All

Type Members

  1. sealed trait CheckSuiteStatus extends EnumEntry

    Permalink

    Represents the overall status of a CheckSuite

  2. case class ChecksSuite(checkSuiteDescription: String, tags: Map[String, String] = Map.empty, seqSingleDatasetMetricsChecks: Seq[SingleDatasetMetricChecks] = Seq.empty, seqDualDatasetMetricChecks: Seq[DualDatasetMetricChecks] = Seq.empty, singleDatasetChecks: Seq[SingleDatasetCheckWithDs] = Seq.empty, datasetComparisonChecks: Seq[DatasetComparisonCheckWithDs] = Seq.empty, arbitraryChecks: Seq[ArbitraryCheck] = Seq.empty, deequChecks: Seq[DeequCheck] = Seq.empty, metricsPersister: MetricsPersister = NullMetricsPersister, deequMetricsRepository: DeequMetricsRepository = new DeequNullMetricsRepository, qcResultsRepository: QcResultsRepository = new NullQcResultsRepository, checkResultCombiner: (Seq[CheckResult]) ⇒ CheckSuiteStatus = ...) extends ChecksSuiteBase with Product with Serializable

    Permalink

    Main entry point which contains the suite of checks you want to perform

    Main entry point which contains the suite of checks you want to perform

    checkSuiteDescription

    - dsecription of the check suite

    tags

    - any tags associated with the check suite

    seqSingleDatasetMetricsChecks

    - list of metric based checks to perform on single datasets

    seqDualDatasetMetricChecks

    - list of metric based checks where the metrics are compared across pairs of datasets

    singleDatasetChecks

    - arbitrary checks performed on single datasets

    datasetComparisonChecks

    - arbitrary checks performed on pairs of datasets

    arbitraryChecks

    - any other arbitrary checks

    deequChecks

    - checks to perform using deequ as the underlying checking mechanism

    metricsPersister

    - how to persist metrics

    deequMetricsRepository

    - how to persist deequ's metrics

    checkResultCombiner

    - how the overall result status should be calculated

  3. trait ChecksSuiteBase extends AnyRef

    Permalink

    Defines a suite of checks to be done

  4. case class ChecksSuiteResult(overallStatus: CheckSuiteStatus, checkSuiteDescription: String, checkResults: Seq[CheckResult], timestamp: Instant, checkTags: Map[String, String]) extends Product with Serializable

    Permalink

    overallStatus

    - Overall status of the CheckSuite. Dependent on the checks within the check suite

    checkSuiteDescription

    - Description of the suite of checks that was run

    checkResults

    - Sequence of CheckResult for every check in the CheckSuite

    timestamp

    - the time the checks were run

    checkTags

    - any tags associated with the CheckSuite

  5. case class ChecksSuitesResults(results: Seq[ChecksSuiteResult]) extends Product with Serializable

    Permalink
  6. case class DatasetComparisonCheckWithDs(datasets: DescribedDatasetPair, checks: Seq[DatasetComparisonCheck]) extends Product with Serializable

    Permalink
  7. case class DeequCheck(describedDataset: DescribedDataset, checks: Seq[DeequQCCheck]) extends Product with Serializable

    Permalink
  8. case class DescribedDataset(ds: Dataset[_], description: String) extends Product with Serializable

    Permalink

    A dataset with description

    A dataset with description

    ds

    - the dataset

    description

    - description of the dataset

  9. case class DescribedDatasetPair(dataset: DescribedDataset, datasetToCompare: DescribedDataset) extends Product with Serializable

    Permalink
  10. case class DualDatasetMetricChecks(describedDatasetA: DescribedDataset, describedDatasetB: DescribedDataset, checks: Seq[DualMetricBasedCheck[_]]) extends Product with Serializable

    Permalink

    List of DualMetricBasedChecks to be run on a pair of datasets

    List of DualMetricBasedChecks to be run on a pair of datasets

    describedDatasetA

    - the first dataset that is part of the comparison

    describedDatasetB

    - the second dataset that is part of the comparison

    checks

    - the checks to be done on the datasets

  11. case class SingleDatasetCheckWithDs(dataset: DescribedDataset, checks: Seq[SingleDatasetCheck]) extends Product with Serializable

    Permalink
  12. case class SingleDatasetMetricChecks(describedDataset: DescribedDataset, checks: Seq[SingleMetricBasedCheck[_]] = Seq.empty) extends Product with Serializable

    Permalink

    List of SingleMetricBasedChecks to be run on a single dataset

    List of SingleMetricBasedChecks to be run on a single dataset

    describedDataset

    - the dataset the checks are being run on

    checks

    - a list of checks to be run

Ungrouped