



package expressions

  1. Public
  2. All

Type Members

  1. case class AddStructFields(struct: Expression, newFields: Seq[Expression]) extends Expression with RewriteAfterResolution with Product with Serializable


    Expression that adds fields to an existing struct.

    Expression that adds fields to an existing struct.

    At optimization time, this expression is rewritten as the creation of new struct with all the fields of the existing struct as well as the new fields. See io.projectglow.sql.optimizer.ReplaceExpressionsRule for more details.

  2. trait AggregateByIndex extends DeclarativeAggregate with HigherOrderFunction


    An expression that allows users to aggregate over all array elements at a specific index in an array column.

    An expression that allows users to aggregate over all array elements at a specific index in an array column. For example, this expression can be used to compute per-sample summary statistics from a genotypes column.

    The user must provide the following arguments: - The array for aggregation - The initialValue for each element in the per-index buffer - An update function to update the buffer with a new element - A merge function to combine two buffers

    The user may optionally provide an evaluate function. If it's not provided, the identity function is used.

    Example usage to calculate average depth across all sites for a sample: aggregate_by_index( genotypes, named_struct('sum', 0l, 'count', 0l), (buf, genotype) -> named_struct('sum', buf.sum + genotype.depth, 'count', buf.count + 1), (buf1, buf2) -> named_struct('sum', buf1.sum + buf2.sum, 'count', buf1.count + buf2.count), buf -> buf.sum / buf.count)

  3. case class ArrayStatsSummary(array: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

  4. case class ArrayToDenseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

  5. case class ArrayToSparseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

  6. case class CallStats(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable

  7. case class CallStatsStruct(callRate: Double, nCalled: Int, nUncalled: Int, nHet: Int, nHomozygous: Array[Int], nNonRef: Int, nAllelesCalled: Int, alleleCounts: Array[Int], alleleFrequencies: Array[Double]) extends Product with Serializable

  8. case class CallSummaryStats(genotypes: Expression, refAllele: Expression, altAlleles: Expression, genotypeInfo: Option[GenotypeInfo], mutableAggBufferOffset: Int, inputAggBufferOffset: Int) extends TypedImperativeAggregate[ArrayBuffer[SampleCallStats]] with ExpectsGenotypeFields with GlowLogging with Product with Serializable


    Computes summary statistics per-sample in a genomic cohort.

    Computes summary statistics per-sample in a genomic cohort. These statistics include the call rate and the number of different types of variants.

    The return type is an array of summary statistics. If sample ids are included in the input schema, they'll be propagated to the results.

  9. case class CovariateQRContext(covQt: DenseMatrix[Double], degreesOfFreedom: Int) extends Product with Serializable


    Context that can be computed once for all variant sites for a linear regression GWAS analysis.

  10. case class DpSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable

  11. case class ExpandStruct(struct: Expression) extends Expression with Unevaluable with Product with Serializable


    Expands all the fields of a potentially unnamed struct.

  12. case class ExplodeMatrix(matrixExpr: Expression) extends Expression with Generator with CodegenFallback with ExpectsInputTypes with Product with Serializable


    Explodes a matrix by row.

    Explodes a matrix by row. Each row of the input matrix will be output as an array of doubles.

    If the input expression is null or has 0 rows, the output will be empty.


    The matrix to explode. May be dense or sparse.

  13. case class FirthFit(fitState: FirthNewtonArgs, logLkhd: Double, converged: Boolean, exploded: Boolean) extends Product with Serializable

  14. case class FirthFitState(x: DenseMatrix[Double], nullFitArgs: FirthNewtonArgs, fullFitArgs: FirthNewtonArgs) extends Product with Serializable

  15. class FirthNewtonArgs extends AnyRef

  16. case class GenotypeStates(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable


    Converts a complex genotype array into an array of ints, where each element is the sum of the calls array for the sample at that position if no calls are missing, or -1 if any calls are missing.

  17. case class GqSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable

  18. case class HardCalls(probabilities: Expression, numAlts: Expression, phased: Expression, threshold: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable


    Converts an array of probabilities (most likely the genotype probabilities from a BGEN file) into hard calls.

    Converts an array of probabilities (most likely the genotype probabilities from a BGEN file) into hard calls. The input probabilities are assumed to be diploid.

    If the input probabilities are phased, each haplotype is called separately by finding the maximum probability greater than the threshold (0.9 by default, a la plink). If no probability is greater than the threshold, the call is -1 (missing).

    If the input probabilities are unphased, the probabilities refer to the complete genotype. In this case, we find the maximum probability greater than the threshold and then convert that value to a genotype call.

    If any of the required parameters (probabilities, numAlts, phased) are null, the expression returns null.


    The probabilities to convert to hard calls. The algorithm does not check that they sum to 1. If the probabilities are unphased, they are assumed to correspond to the genotypes in colex order, which is standard for both BGEN and VCF files.


    The number of alternate alleles at this site.


    Whether the probabilities are phased (per haplotype) or unphased (whole genotype).


    Calls are only generated if at least one probability is above this threshold.

  19. case class HardyWeinberg(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable

  20. case class HardyWeinbergStruct(hetFreqHwe: Double, pValueHwe: Double) extends Product with Serializable

  21. case class LRTFitState(x: DenseMatrix[Double], hessian: DenseMatrix[Double], nullFit: NewtonResult, newtonState: NewtonIterationsState) extends Product with Serializable

  22. case class LiftOverCoordinatesExpr(contigName: Expression, start: Expression, end: Expression, chainFile: Expression, minMatchRatio: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable


    Performs lift over from the specified 0-start, half-open interval (contigName, start, end) on the reference sequence to a query sequence, using the specified chain file and minimum fraction of bases that must remap.

    Performs lift over from the specified 0-start, half-open interval (contigName, start, end) on the reference sequence to a query sequence, using the specified chain file and minimum fraction of bases that must remap.

    We assume the chain file is a constant value so that the LiftOver object can be reused between rows.

    If any of the required parameters (contigName, start, end) are null, the expression returns null. If minMatchRatioOpt contains null, the expression returns null; if it is empty, we use 0.95 to match LiftOver.DEFAULT_LIFTOVER_MINMATCH.


    Chromosome name on the reference sequence.


    Start position (0-start) on the reference sequence.


    End position on the reference sequence.


    UCSC chain format file mapping blocks from the reference sequence to the query sequence.


    The minimum fraction of bases that must remap to lift over successfully.

  23. case class LinearRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression) extends TernaryExpression with ImplicitCastInputTypes with Product with Serializable

  24. case class LogisticRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression, test: Expression) extends QuaternaryExpression with ImplicitCastInputTypes with Product with Serializable

  25. class LogisticRegressionState extends AnyRef

  26. trait LogitTest extends Serializable


    Base trait for logistic regression tests

  27. case class LogitTestResults(beta: Double, oddsRatio: Double, waldConfidenceInterval: Seq[Double], pValue: Double) extends Product with Serializable


    Statistics returned upon performing a logit test.

    Statistics returned upon performing a logit test.


    Log-odds associated with the genotype, NaN if the null/full model fit failed


    Odds ratio associated with the genotype, NaN if the null/full model fit failed


    Wald 95% confidence interval of the odds ratio, NaN if the null/full model fit failed


    P-value for the specified test, NaN if the null/full model fit failed. Determined using the profile likelihood method.

  28. case class MeanSubstitute(array: Expression, missingValue: Expression) extends Expression with RewriteAfterResolution with Product with Serializable


    Substitutes the missing values of an array using the mean of the non-missing values.

    Substitutes the missing values of an array using the mean of the non-missing values. Values that are NaN, null or equal to the missing value parameter are not included in the aggregation, and are substituted with the mean of the non-missing values. If all values are missing, they are substituted with the missing value.

    If the missing value is not provided, the parameter defaults to -1.

  29. case class MomentAggState(count: Long = 0, min: Double = 0, max: Double = 0, mean: Double = 0, m2: Double = 0) extends Product with Serializable


    The state necessary for maintaining moment based aggregations, currently only supported up to m2.

    The state necessary for maintaining moment based aggregations, currently only supported up to m2.

    This functionality is based on the org.apache.spark.sql.catalyst.expressions.aggregate.CentralMomentAgg implementation in Spark and is used to compute summary statistics on arrays as well across many rows for sample based aggregations.

  30. class NewtonIterationsState extends AnyRef

  31. case class NewtonResult(args: NewtonIterationsState, logLkhd: Double, nIter: Int, converged: Boolean, exploded: Boolean) extends Product with Serializable

  32. case class NormalizeVariantExpr(contigName: Expression, start: Expression, end: Expression, refAllele: Expression, altAlleles: Expression, refGenomePathString: Expression) extends SenaryExpression with ImplicitCastInputTypes with Product with Serializable

  33. case class PerSampleSummaryStatistics(genotypes: Expression, field: Expression, genotypeInfo: Option[GenotypeInfo] = None, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends TypedImperativeAggregate[ArrayBuffer[SampleSummaryStatsState]] with ExpectsGenotypeFields with GlowLogging with Product with Serializable


    Computes summary statistics (count, min, max, mean, stdev) for a numeric genotype field for each sample in a cohort.

    Computes summary statistics (count, min, max, mean, stdev) for a numeric genotype field for each sample in a cohort. The field is determined by the provided StructField. If the field does not exist in the genotype struct, an analysis error will be thrown.

    The return type is an array of summary statistics. If sample ids are included in the input, they'll be propagated to the results.

  34. case class RegressionStats(beta: Double, standardError: Double, pValue: Double) extends Product with Serializable

  35. case class SampleCallStats(sampleId: String = null, nCalled: Long = 0, nUncalled: Long = 0, nHomRef: Long = 0, nHet: Long = 0, nHomVar: Long = 0, nInsertion: Long = 0, nDeletion: Long = 0, nTransversion: Long = 0, nTransition: Long = 0, nSpanningDeletion: Long = 0) extends Product with Serializable

  36. case class SampleDpSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable

  37. case class SampleGqSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable

  38. case class SampleSummaryStatsState(sampleId: String, momentAggState: MomentAggState) extends Product with Serializable

  39. case class SubsetStruct(struct: Expression, fields: Seq[Expression]) extends Expression with Rewrite with Product with Serializable

  40. case class UnwrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression) extends DeclarativeAggregate with AggregateByIndex with UnwrappedAggregateFunction with Product with Serializable

  41. trait UnwrappedAggregateFunction extends AggregateFunction


    A hack to make Spark SQL recognize AggregateByIndex as an aggregate expression.

    A hack to make Spark SQL recognize AggregateByIndex as an aggregate expression.

    See io.projectglow.sql.optimizer.ResolveAggregateFunctionsRule for details.

  42. trait VariantType extends AnyRef

  43. case class VectorToArray(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

  44. case class WrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression = LambdaFunction.identity) extends DeclarativeAggregate with AggregateByIndex with Product with Serializable


Value Members

  1. object ArrayToDenseVector extends Serializable

  2. object ArrayToSparseVector extends Serializable

  3. object CallStats extends Serializable

  4. object CovariateQRContext extends GlowLogging with Serializable

  5. object FirthTest extends LogitTest

  6. object HardyWeinberg extends Serializable

  7. object LikelihoodRatioTest extends LogitTest

  8. object LinearRegressionExpr extends Serializable

  9. object LinearRegressionGwas extends GlowLogging

  10. object LogisticRegressionExpr extends Serializable

  11. object LogisticRegressionGwas extends GlowLogging


    Some of the logic used for logistic regression is from the Hail project.

    Some of the logic used for logistic regression is from the Hail project. The Hail project can be found on Github: The Hail project is under an MIT license:

  12. object LogitTestResults extends Serializable

  13. object MomentAggState extends GlowLogging with Serializable

  14. object NormalizeVariantExpr extends Serializable

  15. object SampleCallStats extends GlowLogging with Serializable

  16. object VariantQcExprs extends GlowLogging


    Contains implementations of QC functions.

    Contains implementations of QC functions. These implementations are called during both whole-stage codegen and interpreted execution.

    The functions are exposed to the user as Catalyst expressions.

  17. object VariantType

  18. object VariantUtilExprs


    Implementations of utility functions for transforming variant representations.

    Implementations of utility functions for transforming variant representations. These implementations are called during both whole-stage codegen and interpreted execution.

    The functions are exposed to the user as Catalyst expressions.

  19. object VectorToArray extends Serializable

