Field level diff tool for Avro records.
Big diff between two data sets given a primary key.
Delta of a single field between two records.
Delta level statistics, mean, and the four standardized moments.
Delta level statistics, mean, and the four standardized moments.
deltaType - one of NUMERIC, STRING, VECTOR min - minimum distance seen max - maximum distance seen count - number of differences seen mean - mean of all differences variance - squared deviation from the mean stddev - standard deviation from the mean skewness - measure of data asymmetry in all deltas kurtosis - measure of distribution sharpness and tail thickness in deltas
Delta value of a single node between two records.
Field level diff tool.
Field level statistics.
Field level statistics.
field - "." separated field identifier. count - number of records with different values of the given field. fraction - fraction over total number of keys with different records on both sides. deltaStats - statistics of field value deltas.
Global level statistics.
Global level statistics.
numTotal - number of total unique keys. numSame - number of keys with same records on both sides. numDiff - number of keys with different records on both sides. numMissingLhs - number of keys with missing left hand side record. numMissingRhs - number of keys with missing right hand side record.
Key-field level DiffType and delta.
Key-field level DiffType and delta.
If DiffType are SAME, MISSING_LHS, or MISSING_RHS they will appear once with no Delta If DiffType is DIFFERENT, there is one KeyStats for every field that is different for that key with that field's Delta
keys - primary being compared. diffType - how the two records of the given key compares. delta - a single field's difference including field name, values, and distance
Field level diff tool for ProtoBuf records.
Field level diff tool for TableRow records.
Delta value with a known type and computed difference.
Big diff between two data sets given a primary key.
Compute cosine distance between two vectors.
Delta type of a single node between two records.
Delta type of a single node between two records.
UNKNOWN - unknown type, no numeric delta is computed. NUMERIC - numeric type, e.g. Long, Double, default delta is numeric difference. STRING - string type, default delta is Levenshtein edit distance. VECTOR - repeated numeric type, default delta is 1.0 - cosine similarity.
Diff type between two records of the same key.
Diff type between two records of the same key.
SAME - the two records are identical. DIFFERENT - the two records are different. MISSING_LHS - left hand side record is missing. MISSING_RHS - right hand side record is missing.
Compute Levenshtein edit distance between two strings.
Compute Levenshtein edit distance between two strings. https://rosettacode.org/wiki/Levenshtein_distance#Scala
Companion objects for TypedDelta
.
Delta value of unknown type.
Delta of a single field between two records.
"." separated field identifier
Option(left hand side value), None if null
Option(right hand side value), None if null
delta of numerical values