Adds fields to a struct.
Adds fields to a struct.
The struct to which fields will be added
The new fields to add. The arguments must alternate between string-typed literal field names and field values.
A struct consisting of the input struct and the added fields
0.3.0
Computes custom per-sample aggregates.
Computes custom per-sample aggregates.
array of values.
the initial value
update function
merge function
evaluate function
An array of aggregated values. The number of elements in the array is equal to the number of samples.
0.3.0
Computes the minimum, maximum, mean, standard deviation for an array of numerics.
Computes the minimum, maximum, mean, standard deviation for an array of numerics.
An array of any numeric type
A struct containing double
, mean
, stdDev
, and min
fieldsmax
0.3.0
Converts an array of numerics into a spark.ml
.DenseVector
Converts an array of numerics into a spark.ml
.DenseVector
The array of numerics
A spark.ml
DenseVector
0.3.0
Converts an array of numerics into a spark.ml
.SparseVector
Converts an array of numerics into a spark.ml
.SparseVector
The array of numerics
A spark.ml
SparseVector
0.3.0
Computes call summary statistics for an array of genotype structs.
Computes call summary statistics for an array of genotype structs. See :ref:variant-qc
for more details.
The array of genotype structs with
fieldcalls
A struct containing
, callRate
, nCalled
, nUncalled
, nHet
, nHomozygous
, nNonRef
, nAllelesCalled
, alleleCounts
fields. See :ref:alleleFrequencies
variant-qc
.
0.3.0
Computes summary statistics for the depth field from an array of genotype structs.
Computes summary statistics for the depth field from an array of genotype structs. See :ref:variant-qc
.
An array of genotype structs with
fielddepth
A struct containing
, mean
, stdDev
, and min
of genotype depthsmax
0.3.0
Promotes fields of a nested struct to top-level columns similar to using
from SQL, but can be used in more contexts.struct.*
Promotes fields of a nested struct to top-level columns similar to using
from SQL, but can be used in more contexts.struct.*
The struct to expand
Columns corresponding to fields of the input struct
0.3.0
Explodes a spark.ml
(sparse or dense) into multiple arrays, one per row of the matrix.Matrix
Explodes a spark.ml
(sparse or dense) into multiple arrays, one per row of the matrix.Matrix
The sparl.ml
to explodeMatrix
An array column in which each row is a row of the input matrix
0.3.0
Gets the number of alternate alleles for an array of genotype structs.
Gets the number of alternate alleles for an array of genotype structs. Returns
if there are any -1
s (no-calls) in the calls array.-1
An array of genotype structs with
fieldcalls
An array of integers containing the number of alternate alleles in each call array
0.3.0
Computes summary statistics about the genotype quality field for an array of genotype structs.
Computes summary statistics about the genotype quality field for an array of genotype structs. See :ref:variant-qc
.
The array of genotype structs with
fieldconditionalQuality
A struct containing
, mean
, stdDev
, and min
of genotype qualitiesmax
0.3.0
Converts an array of probabilities to hard calls.
Converts an array of probabilities to hard calls. The probabilities are assumed to be diploid. See :ref:variant-data-transformations
for more details.
The array of probabilities to convert
The number of alternate alleles
Whether the probabilities are phased. If phased, we expect one
values in the probabilities array. If unphased, we expect one probability per possible genotype.2 * numAlts
The minimum probability to make a call. If no probability falls into the range of
or [0, 1 - threshold]
, a no-call (represented by [threshold, 1]
s) will be emitted. If not provided, this parameter defaults to -1
.0.9
An array of hard calls
0.3.0
Computes statistics relating to the Hardy Weinberg equilibrium.
Computes statistics relating to the Hardy Weinberg equilibrium. See :ref:variant-qc
for more details.
The array of genotype structs with
fieldcalls
A struct containing two fields,
(the expected heterozygous frequency according to Hardy-Weinberg equilibrium) and hetFreqHwe
(the associated p-value)pValueHwe
0.3.0
Performs liftover for the coordinates of a variant.
Performs liftover for the coordinates of a variant. To perform liftover of alleles and add additional metadata, see :ref:liftover
.
The current contig name
The current start
The current end
Location of the chain file on each node in the cluster
Minimum fraction of bases that must remap to do liftover successfully. If not provided, defaults to
.0.95
A struct containing
, contigName
, and start
fields after liftoverend
0.3.0
Performs a linear regression association test optimized for performance in a GWAS setting.
Performs a linear regression association test optimized for performance in a GWAS setting. See :ref:linear-regression
for details.
A numeric array of genotypes
A numeric array of phenotypes
A spark.ml
of covariatesMatrix
A struct containing
, beta
, and standardError
fields. See :ref:pValue
linear-regression
.
0.3.0
Performs a logistic regression association test optimized for performance in a GWAS setting.
Performs a logistic regression association test optimized for performance in a GWAS setting. See :ref:logistic-regression
for more details.
An numeric array of genotypes
A double array of phenotype values
A spark.ml
of covariatesMatrix
Which logistic regression test to use. Can be
or LRT
Firth
A struct containing
, beta
, oddsRatio
, and waldConfidenceInterval
fields. See :ref:pValue
logistic-regression
.
0.3.0
Substitutes the missing values of a numeric array using the mean of the non-missing values.
Substitutes the missing values of a numeric array using the mean of the non-missing values. Any values that are NaN, null or equal to the missing value parameter are considered missing. See :ref:variant-data-transformations
for more details.
A numeric array that may contain missing values
A value that should be considered missing. If not provided, this parameter defaults to
.-1
A numeric array with substituted missing values
0.4.0
Normalizes the variant with a behavior similar to vt normalize or bcftools norm.
Normalizes the variant with a behavior similar to vt normalize or bcftools norm.
Creates a StructType column including the normalized
, start
, end
and
referenceAllele
fields (whether they are changed or unchanged as the result of
normalization) as well as a StructType field called alternateAlleles
that
contains the following fields:normalizationStatus
: A boolean field indicating whether the variant data was changed as a result of normalizationchanged
: An error message in case the attempt at normalizing the row hit an error. In this case, the errorMessage
field will be set to changed
. If no errors occur, this field will be false
.null
In case of an error, the
, start
, end
and referenceAllele
fields in the generated struct will be alternateAlleles
.
null
The current contig name
The current start
The current end
The current reference allele
The current array of alternate alleles
A path to the reference genome
file. The .fasta
file must be accompanied with a .fasta
index file in the same folder..fai
A struct as explained above
0.3.0
Computes per-sample call summary statistics.
Computes per-sample call summary statistics. See :ref:sample-qc
for more details.
An array of genotype structs with
fieldcalls
The reference allele
An array of alternate alleles
A struct containing
, sampleId
, callRate
, nCalled
, nUncalled
, nHomRef
, nHet
, nHomVar
, nSnp
, nInsertion
, nDeletion
, nTransition
, nTransversion
, nSpanningDeletion
, rTiTv
, rInsertionDeletion
fields. See :ref:rHetHomVar
sample-qc
.
0.3.0
Computes per-sample summary statistics about the depth field in an array of genotype structs.
Computes per-sample summary statistics about the depth field in an array of genotype structs.
An array of genotype structs with
fielddepth
An array of structs where each struct contains
, mean
, stDev
, and min
of the genotype depths for a sample. If max
is present in a genotype, it will be propagated to the resulting struct as an extra field.sampleId
0.3.0
Computes per-sample summary statistics about the genotype quality field in an array of genotype structs.
Computes per-sample summary statistics about the genotype quality field in an array of genotype structs.
An array of genotype structs with
fieldconditionalQuality
An array of structs where each struct contains
, mean
, stDev
, and min
of the genotype qualities for a sample. If max
is present in a genotype, it will be propagated to the resulting struct as an extra field.sampleId
0.3.0
Selects fields from a struct.
Selects fields from a struct.
Struct from which to select fields
Fields to select
A struct containing only the indicated fields
0.3.0
Converts a spark.ml
(sparse or dense) to an array of doubles.Vector
Converts a spark.ml
(sparse or dense) to an array of doubles.Vector
Vector to convert
An array of doubles
0.3.0
Functions provided by Glow. These functions can be used with Spark's DataFrame API.