expressions

Type Members

case class AddStructFields(struct: Expression, newFields: Seq[Expression]) extends Expression with RewriteAfterResolution with Product with Serializable

Expression that adds fields to an existing struct.
Expression that adds fields to an existing struct.
At optimization time, this expression is rewritten as the creation of new struct with all the fields of the existing struct as well as the new fields. See io.projectglow.sql.optimizer.ReplaceExpressionsRule for more details.
trait AggregateByIndex extends DeclarativeAggregate with HigherOrderFunction

An expression that allows users to aggregate over all array elements at a specific index in an array column.
An expression that allows users to aggregate over all array elements at a specific index in an array column. For example, this expression can be used to compute per-sample summary statistics from a genotypes column.
The user must provide the following arguments: - The array for aggregation - The initialValue for each element in the per-index buffer - An update function to update the buffer with a new element - A merge function to combine two buffers
The user may optionally provide an evaluate function. If it's not provided, the identity function is used.
Example usage to calculate average depth across all sites for a sample: aggregate_by_index( genotypes, named_struct('sum', 0l, 'count', 0l), (buf, genotype) -> named_struct('sum', buf.sum + genotype.depth, 'count', buf.count + 1), (buf1, buf2) -> named_struct('sum', buf1.sum + buf2.sum, 'count', buf1.count + buf2.count), buf -> buf.sum / buf.count)
case class ArrayStatsSummary(array: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable
case class ArrayToDenseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable
case class ArrayToSparseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable
case class CallStats(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable
case class CallStatsStruct(callRate: Double, nCalled: Int, nUncalled: Int, nHet: Int, nHomozygous: Array[Int], nNonRef: Int, nAllelesCalled: Int, alleleCounts: Array[Int], alleleFrequencies: Array[Double]) extends Product with Serializable
case class CallSummaryStats(genotypes: Expression, refAllele: Expression, altAlleles: Expression, genotypeInfo: Option[GenotypeInfo], mutableAggBufferOffset: Int, inputAggBufferOffset: Int) extends TypedImperativeAggregate[ArrayBuffer[SampleCallStats]] with ExpectsGenotypeFields with GlowLogging with Product with Serializable

Computes summary statistics per-sample in a genomic cohort.
Computes summary statistics per-sample in a genomic cohort. These statistics include the call rate and the number of different types of variants.
The return type is an array of summary statistics. If sample ids are included in the input schema, they'll be propagated to the results.
case class CovariateQRContext(covQt: DenseMatrix[Double], degreesOfFreedom: Int) extends Product with Serializable

Context that can be computed once for all variant sites for a linear regression GWAS analysis.
case class DpSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable
case class ExpandStruct(struct: Expression) extends Expression with Unevaluable with Product with Serializable

Expands all the fields of a potentially unnamed struct.
case class ExplodeMatrix(matrixExpr: Expression) extends Expression with Generator with CodegenFallback with ExpectsInputTypes with Product with Serializable

Explodes a matrix by row.
Explodes a matrix by row. Each row of the input matrix will be output as an array of doubles.
If the input expression is null or has 0 rows, the output will be empty.
matrixExpr
The matrix to explode. May be dense or sparse.
case class FirthFit(fitState: FirthNewtonArgs, logLkhd: Double, converged: Boolean, exploded: Boolean) extends Product with Serializable
case class FirthFitState(x: DenseMatrix[Double], nullFitArgs: FirthNewtonArgs, fullFitArgs: FirthNewtonArgs) extends Product with Serializable
class FirthNewtonArgs extends AnyRef
case class GenotypeStates(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable

Converts a complex genotype array into an array of ints, where each element is the sum of the calls array for the sample at that position if no calls are missing, or -1 if any calls are missing.
case class GqSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable
case class HardCalls(probabilities: Expression, numAlts: Expression, phased: Expression, threshold: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable

Converts an array of probabilities (most likely the genotype probabilities from a BGEN file) into hard calls.
Converts an array of probabilities (most likely the genotype probabilities from a BGEN file) into hard calls. The input probabilities are assumed to be diploid.
If the input probabilities are phased, each haplotype is called separately by finding the maximum probability greater than the threshold (0.9 by default, a la plink). If no probability is greater than the threshold, the call is -1 (missing).
If the input probabilities are unphased, the probabilities refer to the complete genotype. In this case, we find the maximum probability greater than the threshold and then convert that value to a genotype call.
If any of the required parameters (probabilities, numAlts, phased) are null, the expression returns null.
probabilities
The probabilities to convert to hard calls. The algorithm does not check that they sum to 1. If the probabilities are unphased, they are assumed to correspond to the genotypes in colex order, which is standard for both BGEN and VCF files.
numAlts
The number of alternate alleles at this site.
phased
Whether the probabilities are phased (per haplotype) or unphased (whole genotype).
threshold
Calls are only generated if at least one probability is above this threshold.
case class HardyWeinberg(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable
case class HardyWeinbergStruct(hetFreqHwe: Double, pValueHwe: Double) extends Product with Serializable
case class LRTFitState(x: DenseMatrix[Double], hessian: DenseMatrix[Double], nullFit: NewtonResult, newtonState: NewtonIterationsState) extends Product with Serializable
case class LiftOverCoordinatesExpr(contigName: Expression, start: Expression, end: Expression, chainFile: Expression, minMatchRatio: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable

Performs lift over from the specified 0-start, half-open interval (contigName, start, end) on the reference sequence to a query sequence, using the specified chain file and minimum fraction of bases that must remap.
Performs lift over from the specified 0-start, half-open interval (contigName, start, end) on the reference sequence to a query sequence, using the specified chain file and minimum fraction of bases that must remap.
We assume the chain file is a constant value so that the LiftOver object can be reused between rows.
If any of the required parameters (contigName, start, end) are null, the expression returns null. If minMatchRatioOpt contains null, the expression returns null; if it is empty, we use 0.95 to match LiftOver.DEFAULT_LIFTOVER_MINMATCH.
contigName
Chromosome name on the reference sequence.
start
Start position (0-start) on the reference sequence.
end
End position on the reference sequence.
chainFile
UCSC chain format file mapping blocks from the reference sequence to the query sequence.
minMatchRatio
The minimum fraction of bases that must remap to lift over successfully.
case class LinearRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression) extends TernaryExpression with ImplicitCastInputTypes with Product with Serializable
case class LogisticRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression, test: Expression) extends QuaternaryExpression with ImplicitCastInputTypes with Product with Serializable
class LogisticRegressionState extends AnyRef
trait LogitTest extends Serializable

Base trait for logistic regression tests
case class LogitTestResults(beta: Double, oddsRatio: Double, waldConfidenceInterval: Seq[Double], pValue: Double) extends Product with Serializable

Statistics returned upon performing a logit test.
Statistics returned upon performing a logit test.
beta
Log-odds associated with the genotype, NaN if the null/full model fit failed
oddsRatio
Odds ratio associated with the genotype, NaN if the null/full model fit failed
waldConfidenceInterval
Wald 95% confidence interval of the odds ratio, NaN if the null/full model fit failed
pValue
P-value for the specified test, NaN if the null/full model fit failed. Determined using the profile likelihood method.
case class MeanSubstitute(array: Expression, missingValue: Expression) extends Expression with RewriteAfterResolution with Product with Serializable

Substitutes the missing values of an array using the mean of the non-missing values.
Substitutes the missing values of an array using the mean of the non-missing values. Values that are NaN, null or equal to the missing value parameter are not included in the aggregation, and are substituted with the mean of the non-missing values. If all values are missing, they are substituted with the missing value.
If the missing value is not provided, the parameter defaults to -1.
case class MomentAggState(count: Long = 0, min: Double = 0, max: Double = 0, mean: Double = 0, m2: Double = 0) extends Product with Serializable

The state necessary for maintaining moment based aggregations, currently only supported up to m2.
The state necessary for maintaining moment based aggregations, currently only supported up to m2.
This functionality is based on the org.apache.spark.sql.catalyst.expressions.aggregate.CentralMomentAgg implementation in Spark and is used to compute summary statistics on arrays as well across many rows for sample based aggregations.
class NewtonIterationsState extends AnyRef
case class NewtonResult(args: NewtonIterationsState, logLkhd: Double, nIter: Int, converged: Boolean, exploded: Boolean) extends Product with Serializable
case class NormalizeVariantExpr(contigName: Expression, start: Expression, end: Expression, refAllele: Expression, altAlleles: Expression, refGenomePathString: Expression) extends SenaryExpression with ImplicitCastInputTypes with Product with Serializable
case class PerSampleSummaryStatistics(genotypes: Expression, field: Expression, genotypeInfo: Option[GenotypeInfo] = None, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends TypedImperativeAggregate[ArrayBuffer[SampleSummaryStatsState]] with ExpectsGenotypeFields with GlowLogging with Product with Serializable

Computes summary statistics (count, min, max, mean, stdev) for a numeric genotype field for each sample in a cohort.
Computes summary statistics (count, min, max, mean, stdev) for a numeric genotype field for each sample in a cohort. The field is determined by the provided StructField. If the field does not exist in the genotype struct, an analysis error will be thrown.
The return type is an array of summary statistics. If sample ids are included in the input, they'll be propagated to the results.
case class RegressionStats(beta: Double, standardError: Double, pValue: Double) extends Product with Serializable
case class SampleCallStats(sampleId: String = null, nCalled: Long = 0, nUncalled: Long = 0, nHomRef: Long = 0, nHet: Long = 0, nHomVar: Long = 0, nInsertion: Long = 0, nDeletion: Long = 0, nTransversion: Long = 0, nTransition: Long = 0, nSpanningDeletion: Long = 0) extends Product with Serializable
case class SampleDpSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable
case class SampleGqSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable
case class SampleSummaryStatsState(sampleId: String, momentAggState: MomentAggState) extends Product with Serializable
case class SubsetStruct(struct: Expression, fields: Seq[Expression]) extends Expression with Rewrite with Product with Serializable
case class UnwrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression) extends DeclarativeAggregate with AggregateByIndex with UnwrappedAggregateFunction with Product with Serializable
trait UnwrappedAggregateFunction extends AggregateFunction

A hack to make Spark SQL recognize AggregateByIndex as an aggregate expression.
A hack to make Spark SQL recognize AggregateByIndex as an aggregate expression.
See io.projectglow.sql.optimizer.ResolveAggregateFunctionsRule for details.
trait VariantType extends AnyRef
case class VectorToArray(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable
case class WrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression = LambdaFunction.identity) extends DeclarativeAggregate with AggregateByIndex with Product with Serializable

Value Members

object ArrayToDenseVector extends Serializable
object ArrayToSparseVector extends Serializable
object CallStats extends Serializable
object CovariateQRContext extends GlowLogging with Serializable
object FirthTest extends LogitTest
object HardyWeinberg extends Serializable
object LikelihoodRatioTest extends LogitTest
object LinearRegressionExpr extends Serializable
object LinearRegressionGwas extends GlowLogging
object LogisticRegressionExpr extends Serializable
object LogisticRegressionGwas extends GlowLogging

Some of the logic used for logistic regression is from the Hail project.
Some of the logic used for logistic regression is from the Hail project. The Hail project can be found on Github: https://github.com/hail-is/hail. The Hail project is under an MIT license: https://github.com/hail-is/hail/blob/master/LICENSE.
object LogitTestResults extends Serializable
object MomentAggState extends GlowLogging with Serializable
object NormalizeVariantExpr extends Serializable
object SampleCallStats extends GlowLogging with Serializable
object VariantQcExprs extends GlowLogging

Contains implementations of QC functions.
Contains implementations of QC functions. These implementations are called during both whole-stage codegen and interpreted execution.
The functions are exposed to the user as Catalyst expressions.
object VariantType
object VariantUtilExprs

Implementations of utility functions for transforming variant representations.
Implementations of utility functions for transforming variant representations. These implementations are called during both whole-stage codegen and interpreted execution.
The functions are exposed to the user as Catalyst expressions.
object VectorToArray extends Serializable

package expressions

Type Members

case class AddStructFields(struct: Expression, newFields: Seq[Expression]) extends Expression with RewriteAfterResolution with Product with Serializable

trait AggregateByIndex extends DeclarativeAggregate with HigherOrderFunction

case class ArrayStatsSummary(array: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

case class ArrayToDenseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

case class ArrayToSparseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

case class CallStats(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable

case class CallStatsStruct(callRate: Double, nCalled: Int, nUncalled: Int, nHet: Int, nHomozygous: Array[Int], nNonRef: Int, nAllelesCalled: Int, alleleCounts: Array[Int], alleleFrequencies: Array[Double]) extends Product with Serializable

case class CovariateQRContext(covQt: DenseMatrix[Double], degreesOfFreedom: Int) extends Product with Serializable

case class DpSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable

case class ExpandStruct(struct: Expression) extends Expression with Unevaluable with Product with Serializable

case class ExplodeMatrix(matrixExpr: Expression) extends Expression with Generator with CodegenFallback with ExpectsInputTypes with Product with Serializable

case class FirthFit(fitState: FirthNewtonArgs, logLkhd: Double, converged: Boolean, exploded: Boolean) extends Product with Serializable

case class FirthFitState(x: DenseMatrix[Double], nullFitArgs: FirthNewtonArgs, fullFitArgs: FirthNewtonArgs) extends Product with Serializable

class FirthNewtonArgs extends AnyRef

case class GenotypeStates(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable

case class GqSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable

case class HardCalls(probabilities: Expression, numAlts: Expression, phased: Expression, threshold: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable

case class HardyWeinberg(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable

case class HardyWeinbergStruct(hetFreqHwe: Double, pValueHwe: Double) extends Product with Serializable

case class LRTFitState(x: DenseMatrix[Double], hessian: DenseMatrix[Double], nullFit: NewtonResult, newtonState: NewtonIterationsState) extends Product with Serializable

case class LiftOverCoordinatesExpr(contigName: Expression, start: Expression, end: Expression, chainFile: Expression, minMatchRatio: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable

case class LinearRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression) extends TernaryExpression with ImplicitCastInputTypes with Product with Serializable

case class LogisticRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression, test: Expression) extends QuaternaryExpression with ImplicitCastInputTypes with Product with Serializable

class LogisticRegressionState extends AnyRef

trait LogitTest extends Serializable

case class LogitTestResults(beta: Double, oddsRatio: Double, waldConfidenceInterval: Seq[Double], pValue: Double) extends Product with Serializable

case class MeanSubstitute(array: Expression, missingValue: Expression) extends Expression with RewriteAfterResolution with Product with Serializable

case class MomentAggState(count: Long = 0, min: Double = 0, max: Double = 0, mean: Double = 0, m2: Double = 0) extends Product with Serializable

class NewtonIterationsState extends AnyRef

case class NewtonResult(args: NewtonIterationsState, logLkhd: Double, nIter: Int, converged: Boolean, exploded: Boolean) extends Product with Serializable

case class NormalizeVariantExpr(contigName: Expression, start: Expression, end: Expression, refAllele: Expression, altAlleles: Expression, refGenomePathString: Expression) extends SenaryExpression with ImplicitCastInputTypes with Product with Serializable

case class RegressionStats(beta: Double, standardError: Double, pValue: Double) extends Product with Serializable

case class SampleCallStats(sampleId: String = null, nCalled: Long = 0, nUncalled: Long = 0, nHomRef: Long = 0, nHet: Long = 0, nHomVar: Long = 0, nInsertion: Long = 0, nDeletion: Long = 0, nTransversion: Long = 0, nTransition: Long = 0, nSpanningDeletion: Long = 0) extends Product with Serializable

case class SampleDpSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable

case class SampleGqSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable

case class SampleSummaryStatsState(sampleId: String, momentAggState: MomentAggState) extends Product with Serializable

case class SubsetStruct(struct: Expression, fields: Seq[Expression]) extends Expression with Rewrite with Product with Serializable

case class UnwrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression) extends DeclarativeAggregate with AggregateByIndex with UnwrappedAggregateFunction with Product with Serializable

trait UnwrappedAggregateFunction extends AggregateFunction

trait VariantType extends AnyRef

case class VectorToArray(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

case class WrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression = LambdaFunction.identity) extends DeclarativeAggregate with AggregateByIndex with Product with Serializable

Value Members

object ArrayToDenseVector extends Serializable

object ArrayToSparseVector extends Serializable

object CallStats extends Serializable

object CovariateQRContext extends GlowLogging with Serializable

object FirthTest extends LogitTest

object HardyWeinberg extends Serializable

object LikelihoodRatioTest extends LogitTest

object LinearRegressionExpr extends Serializable

object LinearRegressionGwas extends GlowLogging

object LogisticRegressionExpr extends Serializable

object LogisticRegressionGwas extends GlowLogging

object LogitTestResults extends Serializable

object MomentAggState extends GlowLogging with Serializable

object NormalizeVariantExpr extends Serializable

object SampleCallStats extends GlowLogging with Serializable

object VariantQcExprs extends GlowLogging

object VariantType

object VariantUtilExprs

object VectorToArray extends Serializable

Ungrouped