Object

io.projectglow

functions

Related Doc: package projectglow

Permalink

object functions

Functions provided by Glow. These functions can be used with Spark's DataFrame API.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. functions
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def add_struct_fields(struct: Column, fields: Column*): Column

    Permalink

    Adds fields to a struct.

    Adds fields to a struct.

    struct

    The struct to which fields will be added

    fields

    The new fields to add. The arguments must alternate between string-typed literal field names and field values.

    returns

    A struct consisting of the input struct and the added fields

    Since

    0.3.0

  5. def aggregate_by_index(arr: Column, initialValue: Column, update: (Column, Column) ⇒ Column, merge: (Column, Column) ⇒ Column): Column

    Permalink
  6. def aggregate_by_index(arr: Column, initialValue: Column, update: (Column, Column) ⇒ Column, merge: (Column, Column) ⇒ Column, evaluate: (Column) ⇒ Column): Column

    Permalink

    Computes custom per-sample aggregates.

    Computes custom per-sample aggregates.

    arr

    array of values.

    initialValue

    the initial value

    update

    update function

    merge

    merge function

    evaluate

    evaluate function

    returns

    An array of aggregated values. The number of elements in the array is equal to the number of samples.

    Since

    0.3.0

  7. def array_summary_stats(arr: Column): Column

    Permalink

    Computes the minimum, maximum, mean, standard deviation for an array of numerics.

    Computes the minimum, maximum, mean, standard deviation for an array of numerics.

    arr

    An array of any numeric type

    returns

    A struct containing double mean, stdDev, min, and max fields

    Since

    0.3.0

  8. def array_to_dense_vector(arr: Column): Column

    Permalink

    Converts an array of numerics into a spark.ml DenseVector.

    Converts an array of numerics into a spark.ml DenseVector.

    arr

    The array of numerics

    returns

    A spark.ml DenseVector

    Since

    0.3.0

  9. def array_to_sparse_vector(arr: Column): Column

    Permalink

    Converts an array of numerics into a spark.ml SparseVector.

    Converts an array of numerics into a spark.ml SparseVector.

    arr

    The array of numerics

    returns

    A spark.ml SparseVector

    Since

    0.3.0

  10. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  11. def call_summary_stats(genotypes: Column): Column

    Permalink

    Computes call summary statistics for an array of genotype structs.

    Computes call summary statistics for an array of genotype structs. See :ref:variant-qc for more details.

    genotypes

    The array of genotype structs with calls field

    returns

    A struct containing callRate, nCalled, nUncalled, nHet, nHomozygous, nNonRef, nAllelesCalled, alleleCounts, alleleFrequencies fields. See :ref:variant-qc.

    Since

    0.3.0

  12. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  13. def dp_summary_stats(genotypes: Column): Column

    Permalink

    Computes summary statistics for the depth field from an array of genotype structs.

    Computes summary statistics for the depth field from an array of genotype structs. See :ref:variant-qc.

    genotypes

    An array of genotype structs with depth field

    returns

    A struct containing mean, stdDev, min, and max of genotype depths

    Since

    0.3.0

  14. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  16. def expand_struct(struct: Column): Column

    Permalink

    Promotes fields of a nested struct to top-level columns similar to using struct.* from SQL, but can be used in more contexts.

    Promotes fields of a nested struct to top-level columns similar to using struct.* from SQL, but can be used in more contexts.

    struct

    The struct to expand

    returns

    Columns corresponding to fields of the input struct

    Since

    0.3.0

  17. def explode_matrix(matrix: Column): Column

    Permalink

    Explodes a spark.ml Matrix (sparse or dense) into multiple arrays, one per row of the matrix.

    Explodes a spark.ml Matrix (sparse or dense) into multiple arrays, one per row of the matrix.

    matrix

    The sparl.ml Matrix to explode

    returns

    An array column in which each row is a row of the input matrix

    Since

    0.3.0

  18. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. def genotype_states(genotypes: Column): Column

    Permalink

    Gets the number of alternate alleles for an array of genotype structs.

    Gets the number of alternate alleles for an array of genotype structs. Returns -1 if there are any -1 s (no-calls) in the calls array.

    genotypes

    An array of genotype structs with calls field

    returns

    An array of integers containing the number of alternate alleles in each call array

    Since

    0.3.0

  20. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  21. def gq_summary_stats(genotypes: Column): Column

    Permalink

    Computes summary statistics about the genotype quality field for an array of genotype structs.

    Computes summary statistics about the genotype quality field for an array of genotype structs. See :ref:variant-qc.

    genotypes

    The array of genotype structs with conditionalQuality field

    returns

    A struct containing mean, stdDev, min, and max of genotype qualities

    Since

    0.3.0

  22. def hard_calls(probabilities: Column, numAlts: Column, phased: Column): Column

    Permalink
  23. def hard_calls(probabilities: Column, numAlts: Column, phased: Column, threshold: Double): Column

    Permalink

    Converts an array of probabilities to hard calls.

    Converts an array of probabilities to hard calls. The probabilities are assumed to be diploid. See :ref:variant-data-transformations for more details.

    probabilities

    The array of probabilities to convert

    numAlts

    The number of alternate alleles

    phased

    Whether the probabilities are phased. If phased, we expect one 2 * numAlts values in the probabilities array. If unphased, we expect one probability per possible genotype.

    threshold

    The minimum probability to make a call. If no probability falls into the range of [0, 1 - threshold] or [threshold, 1], a no-call (represented by -1 s) will be emitted. If not provided, this parameter defaults to 0.9.

    returns

    An array of hard calls

    Since

    0.3.0

  24. def hardy_weinberg(genotypes: Column): Column

    Permalink

    Computes statistics relating to the Hardy Weinberg equilibrium.

    Computes statistics relating to the Hardy Weinberg equilibrium. See :ref:variant-qc for more details.

    genotypes

    The array of genotype structs with calls field

    returns

    A struct containing two fields, hetFreqHwe (the expected heterozygous frequency according to Hardy-Weinberg equilibrium) and pValueHwe (the associated p-value)

    Since

    0.3.0

  25. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  26. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  27. def lift_over_coordinates(contigName: Column, start: Column, end: Column, chainFile: String): Column

    Permalink
  28. def lift_over_coordinates(contigName: Column, start: Column, end: Column, chainFile: String, minMatchRatio: Double): Column

    Permalink

    Performs liftover for the coordinates of a variant.

    Performs liftover for the coordinates of a variant. To perform liftover of alleles and add additional metadata, see :ref:liftover.

    contigName

    The current contig name

    start

    The current start

    end

    The current end

    chainFile

    Location of the chain file on each node in the cluster

    minMatchRatio

    Minimum fraction of bases that must remap to do liftover successfully. If not provided, defaults to 0.95.

    returns

    A struct containing contigName, start, and end fields after liftover

    Since

    0.3.0

  29. def linear_regression_gwas(genotypes: Column, phenotypes: Column, covariates: Column): Column

    Permalink

    Performs a linear regression association test optimized for performance in a GWAS setting.

    Performs a linear regression association test optimized for performance in a GWAS setting. See :ref:linear-regression for details.

    genotypes

    A numeric array of genotypes

    phenotypes

    A numeric array of phenotypes

    covariates

    A spark.ml Matrix of covariates

    returns

    A struct containing beta, standardError, and pValue fields. See :ref:linear-regression.

    Since

    0.3.0

  30. def logistic_regression_gwas(genotypes: Column, phenotypes: Column, covariates: Column, test: String): Column

    Permalink

    Performs a logistic regression association test optimized for performance in a GWAS setting.

    Performs a logistic regression association test optimized for performance in a GWAS setting. See :ref:logistic-regression for more details.

    genotypes

    An numeric array of genotypes

    phenotypes

    A double array of phenotype values

    covariates

    A spark.ml Matrix of covariates

    test

    Which logistic regression test to use. Can be LRT or Firth

    returns

    A struct containing beta, oddsRatio, waldConfidenceInterval, and pValue fields. See :ref:logistic-regression.

    Since

    0.3.0

  31. def mean_substitute(array: Column): Column

    Permalink
  32. def mean_substitute(array: Column, missingValue: Column): Column

    Permalink

    Substitutes the missing values of a numeric array using the mean of the non-missing values.

    Substitutes the missing values of a numeric array using the mean of the non-missing values. Any values that are NaN, null or equal to the missing value parameter are considered missing. See :ref:variant-data-transformations for more details.

    array

    A numeric array that may contain missing values

    missingValue

    A value that should be considered missing. If not provided, this parameter defaults to -1.

    returns

    A numeric array with substituted missing values

    Since

    0.4.0

  33. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  34. def normalize_variant(contigName: Column, start: Column, end: Column, refAllele: Column, altAlleles: Column, refGenomePathString: String): Column

    Permalink

    Normalizes the variant with a behavior similar to vt normalize or bcftools norm.

    Normalizes the variant with a behavior similar to vt normalize or bcftools norm. Creates a StructType column including the normalized start, end, referenceAllele and alternateAlleles fields (whether they are changed or unchanged as the result of normalization) as well as a StructType field called normalizationStatus that contains the following fields:

    changed: A boolean field indicating whether the variant data was changed as a result of normalization

    errorMessage: An error message in case the attempt at normalizing the row hit an error. In this case, the changed field will be set to false. If no errors occur, this field will be null.

    In case of an error, the start, end, referenceAllele and alternateAlleles fields in the generated struct will be null.

    contigName

    The current contig name

    start

    The current start

    end

    The current end

    refAllele

    The current reference allele

    altAlleles

    The current array of alternate alleles

    refGenomePathString

    A path to the reference genome .fasta file. The .fasta file must be accompanied with a .fai index file in the same folder.

    returns

    A struct as explained above

    Since

    0.3.0

  35. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  36. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  37. def sample_call_summary_stats(genotypes: Column, refAllele: Column, alternateAlleles: Column): Column

    Permalink

    Computes per-sample call summary statistics.

    Computes per-sample call summary statistics. See :ref:sample-qc for more details.

    genotypes

    An array of genotype structs with calls field

    refAllele

    The reference allele

    alternateAlleles

    An array of alternate alleles

    returns

    A struct containing sampleId, callRate, nCalled, nUncalled, nHomRef, nHet, nHomVar, nSnp, nInsertion, nDeletion, nTransition, nTransversion, nSpanningDeletion, rTiTv, rInsertionDeletion, rHetHomVar fields. See :ref:sample-qc.

    Since

    0.3.0

  38. def sample_dp_summary_stats(genotypes: Column): Column

    Permalink

    Computes per-sample summary statistics about the depth field in an array of genotype structs.

    Computes per-sample summary statistics about the depth field in an array of genotype structs.

    genotypes

    An array of genotype structs with depth field

    returns

    An array of structs where each struct contains mean, stDev, min, and max of the genotype depths for a sample. If sampleId is present in a genotype, it will be propagated to the resulting struct as an extra field.

    Since

    0.3.0

  39. def sample_gq_summary_stats(genotypes: Column): Column

    Permalink

    Computes per-sample summary statistics about the genotype quality field in an array of genotype structs.

    Computes per-sample summary statistics about the genotype quality field in an array of genotype structs.

    genotypes

    An array of genotype structs with conditionalQuality field

    returns

    An array of structs where each struct contains mean, stDev, min, and max of the genotype qualities for a sample. If sampleId is present in a genotype, it will be propagated to the resulting struct as an extra field.

    Since

    0.3.0

  40. def subset_struct(struct: Column, fields: String*): Column

    Permalink

    Selects fields from a struct.

    Selects fields from a struct.

    struct

    Struct from which to select fields

    fields

    Fields to select

    returns

    A struct containing only the indicated fields

    Since

    0.3.0

  41. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  42. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  43. def vector_to_array(vector: Column): Column

    Permalink

    Converts a spark.ml Vector (sparse or dense) to an array of doubles.

    Converts a spark.ml Vector (sparse or dense) to an array of doubles.

    vector

    Vector to convert

    returns

    An array of doubles

    Since

    0.3.0

  44. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  46. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

complex_type_manipulation

etl

gwas_functions

quality_control

Ungrouped