A copy of Hadoop-BAM's BGZF codec that returns a Databricks BGZF output stream.
Stores the indices of required and optional fields within the genotype element struct after resolution.
Identical to org.apache.spark.sql.execution.datasources.HadoopFileLinesReader, but takes the individual fields of org.apache.spark.sql.execution.datasources.PartitionedFile instead of the object itself.
Identical to org.apache.spark.sql.execution.datasources.HadoopFileLinesReader, but takes the individual fields of org.apache.spark.sql.execution.datasources.PartitionedFile instead of the object itself. PartitionedFile objects cannot be instantiated in DBR because of a binary incompatible change vs OSS Spark.
Implementation pulled from Hail
Expressions that should be rewritten eagerly.
Expressions that should be rewritten eagerly. The rewrite must be able to be performed without knowing the datatype or nullability of any of the children.
In general, rewrite expressions should extend this trait unless they have a compelling reason to inspect their children.
Rewrites that depend on child expressions.
A convenience class to help convert from objects to Spark InternalRows.
Remove the prefix com.databricks.
A trait to simplify type checking and reading for expressions that operate on arrays of genotype data with the expectation that certain fields exists.
A trait to simplify type checking and reading for expressions that operate on arrays of genotype data with the expectation that certain fields exists.
Note: This trait introduces complexity during resolution and analysis, and prevents nested column pruning. Prefer writing new functions as rewrites when possible.
(Since version 0.4.1) Write functions as rewrites when possible
Stores the indices of required and optional fields within the genotype element struct after resolution.
The number of fields in the struct
The indices of required fields. 0 <= idx < size.
The indices of optional fields. -1 if not the field is not present.