Calculates basic summary stats (min, max, mean, sample stddev) on an array of double typed values.
Calculates basic summary stats (min, max, mean, sample stddev) on an array of double typed values. These are calculated using a one pass algorithm described in https://arxiv.org/abs/1510.04923
The algorithm used is adapted from org.apache.spark.sql.catalyst.expressions.aggregate.CentralMomentAgg
Calculates a variety of summary stats on the calls for a given site.
Calculates a variety of summary stats on the calls for a given site. This method returns a case class so that the output can be used easily from other QC functions as well as returned directly to the user.
an array of structs with the schema defined in CallStats.requiredSchema
the position of the calls within the element struct of the genotypes array
Performs a two-sided test of the Hardy-Weinberg equilibrium.
Performs a two-sided test of the Hardy-Weinberg equilibrium. Returns the expected het frequency as well as the associated p value.
an array of structs with the schema required by CallStats
the position of the genotype struct (with calls and phasing info) within the element struct of the genotypes array
a row with the schema of HardyWeinbergStruct
Converts an array of struct-typed expressions into a slimmed down struct with a subset of the fields.
Converts an array of struct-typed expressions into a slimmed down struct with a subset of the fields.
We use this function for many of the variant QC functions so that each function can require a specific schema.
the desired schema
an array of struct-typed expressions that contains a superset of the fields in
schema
a transformed array of struct-typed expressions with the schema of schema
Contains implementations of QC functions. These implementations are called during both whole-stage codegen and interpreted execution.
The functions are exposed to the user as Catalyst expressions.