Field level diff tool for Avro records.
Big diff between two data sets given a primary key.
Delta of a single field between two records.
Delta level statistics, mean, and the four standardized moments.
Delta level statistics, mean, and the four standardized moments.
deltaType - one of NUMERIC, STRING, VECTOR min - minimum distance seen max - maximum distance seen count - number of differences seen mean - mean of all differences variance - squared deviation from the mean stddev - standard deviation from the mean skewness - measure of data asymmetry in all deltas kurtosis - measure of distribution sharpness and tail thickness in deltas
Delta value of a single node between two records.
Field level diff tool.
Field level diff tool.
Use ignore
to specify set of fields to ignore during comparison.
Use unordered
to specify set of fields to be treated as unordered, i.e. sort before comparison.
Field level statistics.
Field level statistics.
field - "." separated field identifier. count - number of records with different values of the given field. fraction - fraction over total number of keys with different records on both sides. deltaStats - statistics of field value deltas.
Global level statistics.
Global level statistics.
numTotal - number of total unique keys. numSame - number of keys with same records on both sides. numDiff - number of keys with different records on both sides. numMissingLhs - number of keys with missing left hand side record. numMissingRhs - number of keys with missing right hand side record.
Key-field level DiffType and delta.
Key-field level DiffType and delta.
If DiffType are SAME, MISSING_LHS, or MISSING_RHS they will appear once with no Delta If DiffType is DIFFERENT, there is one KeyStats for every field that is different for that key with that field's Delta
key - primary being compared. diffType - how the two records of the given key compares. delta - a single field's difference including field name, values, and distance
Field level diff tool for ProtoBuf records.
Field level diff tool for TableRow records.
Delta value with a known type and computed difference.
Big diff between two data sets given a primary key.
Compute cosine distance between two vectors.
Delta type of a single node between two records.
Delta type of a single node between two records.
UNKNOWN - unknown type, no numeric delta is computed. NUMERIC - numeric type, e.g. Long, Double, default delta is numeric difference. STRING - string type, default delta is Levenshtein edit distance. VECTOR - repeated numeric type, default delta is 1.0 - cosine similarity.
Diff type between two records of the same key.
Diff type between two records of the same key.
SAME - the two records are identical. DIFFERENT - the two records are different. MISSING_LHS - left hand side record is missing. MISSING_RHS - right hand side record is missing.
Compute Levenshtein edit distance between two strings.
Compute Levenshtein edit distance between two strings. https://rosettacode.org/wiki/Levenshtein_distance#Scala
Companion objects for TypedDelta
.
Delta value of unknown type.
Delta of a single field between two records.
"." separated field identifier
left hand side value
right hand side value
delta of numerical values