Metrics produced by AssessPhasing
describing various statistics assessing the performance of phasing variants
relative to a known set of phased variant calls.
Metrics produced by AssessPhasing
describing various statistics assessing the performance of phasing variants
relative to a known set of phased variant calls. Included are methods for assessing sensitivity and accuracy from
a number of previous papers (ex. http://dx.doi.org/10.1038%2Fng.3119).
The N50, N90, and L50 statistics are defined as follows:
- The N50 is the longest block length such that the bases covered by all blocks this length and longer are at least
50% of the # of bases covered by all blocks.
- The N90 is the longest block length such that the bases covered by all blocks this length and longer are at least
90% of the # of bases covered by all blocks.
- The L50 is the smallest number of blocks such that the sum of the lengths of the blocks is >=
50% of the sum of
the lengths of all blocks.
The number of variants called.
The number of variants called with phase.
The number of variants with known truth genotypes.
The number of variants with known truth genotypes with phase.
The number of variants called that had a known phased genotype.
The number of variants called with phase that had a known phased genotype.
The number of known phased variants that were in a called phased block.
The number of called phase variants that had a known phased genotype in a called phased block.
The number of short switch errors (isolated switch errors).
The number of long switch errors (# of runs of consecutive switch errors).
The number of sites that could be (short or long) switch errors (i.e. the # of sites with both known and called phased variants).
The number of point switch errors (defined in http://dx.doi.org/10.1038%2Fng.3119).
The number of long switch errors (defined in http://dx.doi.org/10.1038%2Fng.3119).
The number of sites that could be (point or long) switch errors (defined in http://dx.doi.org/10.1038%2Fng.3119).
The fraction of called variants with phase.
The fraction of known phased variants called with phase.
The fraction of phased known genotypes in a called phased block.
The fraction of called phased variants that had a known phased genotype in a called phased block.
The fraction of switch sites without short switch errors (1 - (num_short_switch_errors / num_switch_sites)
).
The fraction of switch sites without long switch errors (1 - (num_long_switch_errors / num_switch_sites)
).
The fraction of switch sites without point switch errors according to the Illumina
method defining switch sites and errors (1 - (num_illumina_point_switch_errors / num_illumina_switch_sites )
).
The fraction of switch sites wihtout long switch errors according to the Illumina
method defining switch sites and errors (1 - (num_illumina_long_switch_errors / num_illumina_switch_sites )
).
The mean phased block length in the callset.
The median phased block length in the callset.
The standard deviation of the phased block length in the callset.
The N50 of the phased block length in the callset.
The N90 of the phased block length in the callset.
The L50 of the phased block length in the callset.
The mean phased block length in the truth.
The median phased block length in the truth.
The standard deviation of the phased block length in the truth.
The N50 of the phased block length in the truth.
The N90 of the phased block length in the callset.
The L50 of the phased block length in the callset.
Iterates over multiple variant context iterators such that we return a list of contexts for the union of sites across the iterators.
Iterates over multiple variant context iterators such that we return a list of contexts for the union of sites across the iterators. If samples is given, we subset each variant context to just that sample.
Creates a VCF by mixing two germline samples at a given proportion.
Creates a VCF by mixing two germline samples at a given proportion.
Metrics produced by AssessPhasing
describing the number of phased blocks of a given length.
Metrics produced by AssessPhasing
describing the number of phased blocks of a given length. The output will have
multiple rows, one for each observed phased block length.
The name of the dataset being assessed (i.e. "truth" or "called").
The length of the phased block.
The number of phased blocks of the given length.
Simple mask that loads variants one reference sequence at a time and creates a compact representation allowing for rapid querying of whether or not positions are overlapped by one or more variants.