Package

com.spotify.ratatool

diffy

Permalink

package diffy

Visibility
  1. Public
  2. All

Type Members

  1. class AvroDiffy[T <: GenericRecord] extends Diffy[T]

    Permalink

    Field level diff tool for Avro records.

  2. class BigDiffy[T] extends AnyRef

    Permalink

    Big diff between two data sets given a primary key.

  3. case class Delta(field: String, left: Option[Any], right: Option[Any], delta: DeltaValue) extends Product with Serializable

    Permalink

    Delta of a single field between two records.

    Delta of a single field between two records.

    field

    "." separated field identifier

    left

    Option(left hand side value), None if null

    right

    Option(right hand side value), None if null

    delta

    delta of numerical values

  4. case class DeltaStats(deltaType: DeltaType.Value, min: Double, max: Double, count: Long, mean: Double, variance: Double, stddev: Double, skewness: Double, kurtosis: Double) extends Product with Serializable

    Permalink

    Delta level statistics, mean, and the four standardized moments.

    Delta level statistics, mean, and the four standardized moments.

    deltaType - one of NUMERIC, STRING, VECTOR min - minimum distance seen max - maximum distance seen count - number of differences seen mean - mean of all differences variance - squared deviation from the mean stddev - standard deviation from the mean skewness - measure of data asymmetry in all deltas kurtosis - measure of distribution sharpness and tail thickness in deltas

  5. sealed trait DeltaValue extends AnyRef

    Permalink

    Delta value of a single node between two records.

  6. abstract class Diffy[T] extends Serializable

    Permalink

    Field level diff tool.

  7. case class FieldStats(field: String, count: Long, fraction: Double, deltaStats: Option[DeltaStats]) extends Product with Serializable

    Permalink

    Field level statistics.

    Field level statistics.

    field - "." separated field identifier. count - number of records with different values of the given field. fraction - fraction over total number of keys with different records on both sides. deltaStats - statistics of field value deltas.

  8. case class GlobalStats(numTotal: Long, numSame: Long, numDiff: Long, numMissingLhs: Long, numMissingRhs: Long) extends Product with Serializable

    Permalink

    Global level statistics.

    Global level statistics.

    numTotal - number of total unique keys. numSame - number of keys with same records on both sides. numDiff - number of keys with different records on both sides. numMissingLhs - number of keys with missing left hand side record. numMissingRhs - number of keys with missing right hand side record.

  9. case class KeyStats(keys: MultiKey, diffType: DiffType.Value, delta: Option[Delta]) extends Product with Serializable

    Permalink

    Key-field level DiffType and delta.

    Key-field level DiffType and delta.

    If DiffType are SAME, MISSING_LHS, or MISSING_RHS they will appear once with no Delta If DiffType is DIFFERENT, there is one KeyStats for every field that is different for that key with that field's Delta

    keys - primary being compared. diffType - how the two records of the given key compares. delta - a single field's difference including field name, values, and distance

  10. final case class MultiKey(keys: Seq[String]) extends AnyVal with Product with Serializable

    Permalink
  11. sealed trait OutputMode extends AnyRef

    Permalink
  12. class ProtoBufDiffy[T <: AbstractMessage] extends Diffy[T]

    Permalink

    Field level diff tool for ProtoBuf records.

  13. class TableRowDiffy extends Diffy[TableRow]

    Permalink

    Field level diff tool for TableRow records.

  14. case class TypedDelta(deltaType: DeltaType.Value, value: Double) extends DeltaValue with Product with Serializable

    Permalink

    Delta value with a known type and computed difference.

Value Members

  1. object BQ extends OutputMode with Product with Serializable

    Permalink
  2. object BigDiffy extends Command

    Permalink

    Big diff between two data sets given a primary key.

  3. object CosineDistance

    Permalink

    Compute cosine distance between two vectors.

  4. object DeltaType extends Enumeration

    Permalink

    Delta type of a single node between two records.

    Delta type of a single node between two records.

    UNKNOWN - unknown type, no numeric delta is computed. NUMERIC - numeric type, e.g. Long, Double, default delta is numeric difference. STRING - string type, default delta is Levenshtein edit distance. VECTOR - repeated numeric type, default delta is 1.0 - cosine similarity.

  5. object DiffType extends Enumeration

    Permalink

    Diff type between two records of the same key.

    Diff type between two records of the same key.

    SAME - the two records are identical. DIFFERENT - the two records are different. MISSING_LHS - left hand side record is missing. MISSING_RHS - right hand side record is missing.

  6. object GCS extends OutputMode with Product with Serializable

    Permalink
  7. object Levenshtein

    Permalink

    Compute Levenshtein edit distance between two strings.

    Compute Levenshtein edit distance between two strings. https://rosettacode.org/wiki/Levenshtein_distance#Scala

  8. object MultiKey extends Serializable

    Permalink
  9. object NumericDelta

    Permalink

    Companion objects for TypedDelta.

  10. object StringDelta

    Permalink
  11. object UnknownDelta extends DeltaValue with Product with Serializable

    Permalink

    Delta value of unknown type.

  12. object VectorDelta

    Permalink

Ungrouped