Package

io.projectglow.sql

expressions

Permalink

package expressions

Visibility
  1. Public
  2. All

Type Members

  1. case class AddStructFields(struct: Expression, newFields: Seq[Expression]) extends Expression with RewriteAfterResolution with Product with Serializable

    Permalink

    Expression that adds fields to an existing struct.

    Expression that adds fields to an existing struct.

    At optimization time, this expression is rewritten as the creation of new struct with all the fields of the existing struct as well as the new fields. See io.projectglow.sql.optimizer.ReplaceExpressionsRule for more details.

  2. trait AggregateByIndex extends DeclarativeAggregate with HigherOrderFunction

    Permalink

    An expression that allows users to aggregate over all array elements at a specific index in an array column.

    An expression that allows users to aggregate over all array elements at a specific index in an array column. For example, this expression can be used to compute per-sample summary statistics from a genotypes column.

    The user must provide the following arguments: - The array for aggregation - The initialValue for each element in the per-index buffer - An update function to update the buffer with a new element - A merge function to combine two buffers

    The user may optionally provide an evaluate function. If it's not provided, the identity function is used.

    Example usage to calculate average depth across all sites for a sample: aggregate_by_index( genotypes, named_struct('sum', 0l, 'count', 0l), (buf, genotype) -> named_struct('sum', buf.sum + genotype.depth, 'count', buf.count + 1), (buf1, buf2) -> named_struct('sum', buf1.sum + buf2.sum, 'count', buf1.count + buf2.count), buf -> buf.sum / buf.count)

  3. case class ArrayStatsSummary(array: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

    Permalink
  4. case class ArrayToDenseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

    Permalink
  5. case class ArrayToSparseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

    Permalink
  6. case class CallStats(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable

    Permalink
  7. case class CallStatsStruct(callRate: Double, nCalled: Int, nUncalled: Int, nHet: Int, nHomozygous: Array[Int], nNonRef: Int, nAllelesCalled: Int, alleleCounts: Array[Int], alleleFrequencies: Array[Double]) extends Product with Serializable

    Permalink
  8. case class CallSummaryStats(genotypes: Expression, refAllele: Expression, altAlleles: Expression, genotypeInfo: Option[GenotypeInfo], mutableAggBufferOffset: Int, inputAggBufferOffset: Int) extends TypedImperativeAggregate[ArrayBuffer[SampleCallStats]] with ExpectsGenotypeFields with GlowLogging with Product with Serializable

    Permalink

    Computes summary statistics per-sample in a genomic cohort.

    Computes summary statistics per-sample in a genomic cohort. These statistics include the call rate and the number of different types of variants.

    The return type is an array of summary statistics. If sample ids are included in the input schema, they'll be propagated to the results.

  9. case class CovariateQRContext(covQt: DenseMatrix[Double], degreesOfFreedom: Int) extends Product with Serializable

    Permalink

    Context that can be computed once for all variant sites for a linear regression GWAS analysis.

  10. case class DpSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable

    Permalink
  11. case class ExpandStruct(struct: Expression) extends Expression with Unevaluable with Product with Serializable

    Permalink

    Expands all the fields of a potentially unnamed struct.

  12. case class ExplodeMatrix(matrixExpr: Expression) extends Expression with Generator with CodegenFallback with ExpectsInputTypes with Product with Serializable

    Permalink

    Explodes a matrix by row.

    Explodes a matrix by row. Each row of the input matrix will be output as an array of doubles.

    If the input expression is null or has 0 rows, the output will be empty.

    matrixExpr

    The matrix to explode. May be dense or sparse.

  13. case class FirthFit(fitState: FirthNewtonArgs, logLkhd: Double, converged: Boolean, exploded: Boolean) extends Product with Serializable

    Permalink
  14. case class FirthFitState(x: DenseMatrix[Double], nullFitArgs: FirthNewtonArgs, fullFitArgs: FirthNewtonArgs) extends Product with Serializable

    Permalink
  15. class FirthNewtonArgs extends AnyRef

    Permalink
  16. case class GenotypeStates(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable

    Permalink

    Converts a complex genotype array into an array of ints, where each element is the sum of the calls array for the sample at that position if no calls are missing, or -1 if any calls are missing.

  17. case class GqSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable

    Permalink
  18. case class HardCalls(probabilities: Expression, numAlts: Expression, phased: Expression, threshold: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable

    Permalink

    Converts an array of probabilities (most likely the genotype probabilities from a BGEN file) into hard calls.

    Converts an array of probabilities (most likely the genotype probabilities from a BGEN file) into hard calls. The input probabilities are assumed to be diploid.

    If the input probabilities are phased, each haplotype is called separately by finding the maximum probability greater than the threshold (0.9 by default, a la plink). If no probability is greater than the threshold, the call is -1 (missing).

    If the input probabilities are unphased, the probabilities refer to the complete genotype. In this case, we find the maximum probability greater than the threshold and then convert that value to a genotype call.

    If any of the required parameters (probabilities, numAlts, phased) are null, the expression returns null.

    probabilities

    The probabilities to convert to hard calls. The algorithm does not check that they sum to 1. If the probabilities are unphased, they are assumed to correspond to the genotypes in colex order, which is standard for both BGEN and VCF files.

    numAlts

    The number of alternate alleles at this site.

    phased

    Whether the probabilities are phased (per haplotype) or unphased (whole genotype).

    threshold

    Calls are only generated if at least one probability is above this threshold.

  19. case class HardyWeinberg(genotypes: Expression, genotypeInfo: Option[GenotypeInfo]) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable

    Permalink
  20. case class HardyWeinbergStruct(hetFreqHwe: Double, pValueHwe: Double) extends Product with Serializable

    Permalink
  21. case class LRTFitState(x: DenseMatrix[Double], hessian: DenseMatrix[Double], nullFit: NewtonResult, newtonState: NewtonIterationsState) extends Product with Serializable

    Permalink
  22. case class LiftOverCoordinatesExpr(contigName: Expression, start: Expression, end: Expression, chainFile: Expression, minMatchRatio: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable

    Permalink

    Performs lift over from the specified 0-start, half-open interval (contigName, start, end) on the reference sequence to a query sequence, using the specified chain file and minimum fraction of bases that must remap.

    Performs lift over from the specified 0-start, half-open interval (contigName, start, end) on the reference sequence to a query sequence, using the specified chain file and minimum fraction of bases that must remap.

    We assume the chain file is a constant value so that the LiftOver object can be reused between rows.

    If any of the required parameters (contigName, start, end) are null, the expression returns null. If minMatchRatioOpt contains null, the expression returns null; if it is empty, we use 0.95 to match LiftOver.DEFAULT_LIFTOVER_MINMATCH.

    contigName

    Chromosome name on the reference sequence.

    start

    Start position (0-start) on the reference sequence.

    end

    End position on the reference sequence.

    chainFile

    UCSC chain format file mapping blocks from the reference sequence to the query sequence.

    minMatchRatio

    The minimum fraction of bases that must remap to lift over successfully.

  23. case class LinearRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression) extends TernaryExpression with ImplicitCastInputTypes with Product with Serializable

    Permalink
  24. case class LogisticRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression, test: Expression) extends QuaternaryExpression with ImplicitCastInputTypes with Product with Serializable

    Permalink
  25. class LogisticRegressionState extends AnyRef

    Permalink
  26. trait LogitTest extends Serializable

    Permalink

    Base trait for logistic regression tests

  27. case class LogitTestResults(beta: Double, oddsRatio: Double, waldConfidenceInterval: Seq[Double], pValue: Double) extends Product with Serializable

    Permalink

    Statistics returned upon performing a logit test.

    Statistics returned upon performing a logit test.

    beta

    Log-odds associated with the genotype, NaN if the null/full model fit failed

    oddsRatio

    Odds ratio associated with the genotype, NaN if the null/full model fit failed

    waldConfidenceInterval

    Wald 95% confidence interval of the odds ratio, NaN if the null/full model fit failed

    pValue

    P-value for the specified test, NaN if the null/full model fit failed. Determined using the profile likelihood method.

  28. case class MeanSubstitute(array: Expression, missingValue: Expression) extends Expression with RewriteAfterResolution with Product with Serializable

    Permalink

    Substitutes the missing values of an array using the mean of the non-missing values.

    Substitutes the missing values of an array using the mean of the non-missing values. Values that are NaN, null or equal to the missing value parameter are not included in the aggregation, and are substituted with the mean of the non-missing values. If all values are missing, they are substituted with the missing value.

    If the missing value is not provided, the parameter defaults to -1.

  29. case class MomentAggState(count: Long = 0, min: Double = 0, max: Double = 0, mean: Double = 0, m2: Double = 0) extends Product with Serializable

    Permalink

    The state necessary for maintaining moment based aggregations, currently only supported up to m2.

    The state necessary for maintaining moment based aggregations, currently only supported up to m2.

    This functionality is based on the org.apache.spark.sql.catalyst.expressions.aggregate.CentralMomentAgg implementation in Spark and is used to compute summary statistics on arrays as well across many rows for sample based aggregations.

  30. class NewtonIterationsState extends AnyRef

    Permalink
  31. case class NewtonResult(args: NewtonIterationsState, logLkhd: Double, nIter: Int, converged: Boolean, exploded: Boolean) extends Product with Serializable

    Permalink
  32. case class NormalizeVariantExpr(contigName: Expression, start: Expression, end: Expression, refAllele: Expression, altAlleles: Expression, refGenomePathString: Expression) extends SenaryExpression with ImplicitCastInputTypes with Product with Serializable

    Permalink
  33. case class PerSampleSummaryStatistics(genotypes: Expression, field: Expression, genotypeInfo: Option[GenotypeInfo] = None, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends TypedImperativeAggregate[ArrayBuffer[SampleSummaryStatsState]] with ExpectsGenotypeFields with GlowLogging with Product with Serializable

    Permalink

    Computes summary statistics (count, min, max, mean, stdev) for a numeric genotype field for each sample in a cohort.

    Computes summary statistics (count, min, max, mean, stdev) for a numeric genotype field for each sample in a cohort. The field is determined by the provided StructField. If the field does not exist in the genotype struct, an analysis error will be thrown.

    The return type is an array of summary statistics. If sample ids are included in the input, they'll be propagated to the results.

  34. case class RegressionStats(beta: Double, standardError: Double, pValue: Double) extends Product with Serializable

    Permalink
  35. case class SampleCallStats(sampleId: String = null, nCalled: Long = 0, nUncalled: Long = 0, nHomRef: Long = 0, nHet: Long = 0, nHomVar: Long = 0, nInsertion: Long = 0, nDeletion: Long = 0, nTransversion: Long = 0, nTransition: Long = 0, nSpanningDeletion: Long = 0) extends Product with Serializable

    Permalink
  36. case class SampleDpSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable

    Permalink
  37. case class SampleGqSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable

    Permalink
  38. case class SampleSummaryStatsState(sampleId: String, momentAggState: MomentAggState) extends Product with Serializable

    Permalink
  39. case class SubsetStruct(struct: Expression, fields: Seq[Expression]) extends Expression with Rewrite with Product with Serializable

    Permalink
  40. case class UnwrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression) extends DeclarativeAggregate with AggregateByIndex with UnwrappedAggregateFunction with Product with Serializable

    Permalink
  41. trait UnwrappedAggregateFunction extends AggregateFunction

    Permalink

    A hack to make Spark SQL recognize AggregateByIndex as an aggregate expression.

    A hack to make Spark SQL recognize AggregateByIndex as an aggregate expression.

    See io.projectglow.sql.optimizer.ResolveAggregateFunctionsRule for details.

  42. trait VariantType extends AnyRef

    Permalink
  43. case class VectorToArray(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable

    Permalink
  44. case class WrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression = LambdaFunction.identity) extends DeclarativeAggregate with AggregateByIndex with Product with Serializable

    Permalink

Value Members

  1. object ArrayToDenseVector extends Serializable

    Permalink
  2. object ArrayToSparseVector extends Serializable

    Permalink
  3. object CallStats extends Serializable

    Permalink
  4. object CovariateQRContext extends GlowLogging with Serializable

    Permalink
  5. object FirthTest extends LogitTest

    Permalink
  6. object HardyWeinberg extends Serializable

    Permalink
  7. object LikelihoodRatioTest extends LogitTest

    Permalink
  8. object LinearRegressionExpr extends Serializable

    Permalink
  9. object LinearRegressionGwas extends GlowLogging

    Permalink
  10. object LogisticRegressionExpr extends Serializable

    Permalink
  11. object LogisticRegressionGwas extends GlowLogging

    Permalink

    Some of the logic used for logistic regression is from the Hail project.

    Some of the logic used for logistic regression is from the Hail project. The Hail project can be found on Github: https://github.com/hail-is/hail. The Hail project is under an MIT license: https://github.com/hail-is/hail/blob/master/LICENSE.

  12. object LogitTestResults extends Serializable

    Permalink
  13. object MomentAggState extends GlowLogging with Serializable

    Permalink
  14. object NormalizeVariantExpr extends Serializable

    Permalink
  15. object SampleCallStats extends GlowLogging with Serializable

    Permalink
  16. object VariantQcExprs extends GlowLogging

    Permalink

    Contains implementations of QC functions.

    Contains implementations of QC functions. These implementations are called during both whole-stage codegen and interpreted execution.

    The functions are exposed to the user as Catalyst expressions.

  17. object VariantType

    Permalink
  18. object VariantUtilExprs

    Permalink

    Implementations of utility functions for transforming variant representations.

    Implementations of utility functions for transforming variant representations. These implementations are called during both whole-stage codegen and interpreted execution.

    The functions are exposed to the user as Catalyst expressions.

  19. object VectorToArray extends Serializable

    Permalink

Ungrouped