object CollectDuplexSeqMetrics
Companion object for CollectDuplexSeqMetrics that contains various constants and types, including all the various com.fulcrumgenomics.util.Metric sub-types produced by the program.
- Alphabetic
- By Inheritance
- CollectDuplexSeqMetrics
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
DuplexFamilySizeMetric(ab_size: Int, ba_size: Int, count: Count = 0, fraction: Proportion = 0, fraction_gt_or_eq_size: Proportion = 0) extends Metric with Ordered[DuplexFamilySizeMetric] with Product with Serializable
Metrics produced by
CollectDuplexSeqMetrics
to describe the distribution of double-stranded (duplex) tag families in terms of the number of reads observed on each strand.Metrics produced by
CollectDuplexSeqMetrics
to describe the distribution of double-stranded (duplex) tag families in terms of the number of reads observed on each strand.We refer to the two strands as
ab
andba
because we identify the two strands by observing the same pair of UMIs (A and B) in opposite order (A->B vs B->A). Which strand isab
and which isba
is largely arbitrary, so to make interpretation of the metrics simpler we use a definition here that for a given tag familyab
is the sub-family with more reads andba
is the tag family with fewer reads.- ab_size
The number of reads in the
ab
sub-family (the larger sub-family) for this double-strand tag family.- ba_size
The number of reads in the
ba
sub-family (the smaller sub-family) for this double-strand tag family.- count
The number of families with the
ab
andba
single-strand families of sizeab_size
andba_size
.- fraction
The fraction of all double-stranded tag families that have
ab_size
andba_size
.- fraction_gt_or_eq_size
The fraction of all double-stranded tag families that have
ab reads >= ab_size
andba reads >= ba_size
.
-
case class
DuplexUmiMetric(umi: String, raw_observations: Count = 0, raw_observations_with_errors: Count = 0, unique_observations: Count = 0, fraction_raw_observations: Proportion = 0, fraction_unique_observations: Proportion = 0, fraction_unique_observations_expected: Proportion = 0) extends Metric with Product with Serializable
Metrics produced by
CollectDuplexSeqMetrics
describing the set of observed duplex UMI sequences and the frequency of their observations.Metrics produced by
CollectDuplexSeqMetrics
describing the set of observed duplex UMI sequences and the frequency of their observations. The UMI sequences reported may have been corrected using information within a double-stranded tag family. For example if a tag family is comprised of three read pairs with UMIsACGT-TGGT
,ACGT-TGGT
, andACGT-TGGG
then a consensus UMI ofACGT-TGGT
will be generated.UMI pairs are normalized within a tag family so that observations are always reported as if they came from a read pair with read 1 on the positive strand (F1R2). Another way to view this is that for FR or RF read pairs, the duplex UMI reported is the UMI from the positive strand read followed by the UMI from the negative strand read. E.g. a read pair with UMI
AAAA-GGGG
and with R1 on the negative strand and R2 on the positive strand, will be reported asGGGG-AAAA
.- umi
The duplex UMI sequence, possibly-corrected.
- raw_observations
The number of read pairs in the input BAM that observe the duplex UMI (after correction).
- raw_observations_with_errors
The subset of raw observations that underwent any correction.
- unique_observations
The number of double-stranded tag families (i.e unique double-stranded molecules) that observed the duplex UMI.
- fraction_raw_observations
The fraction of all raw observations that the duplex UMI accounts for.
- fraction_unique_observations
The fraction of all unique observations that the duplex UMI accounts for.
- fraction_unique_observations_expected
The fraction of all unique observations that are expected to be attributed to the duplex UMI based on the
fraction_unique_observations
of the two individual UMIs.
-
case class
DuplexYieldMetric(fraction: Proportion, read_pairs: Count, cs_families: Count, ss_families: Count, ds_families: Count, ds_duplexes: Count, ds_fraction_duplexes: Proportion, ds_fraction_duplexes_ideal: Proportion) extends Metric with Product with Serializable
Metrics produced by
CollectDuplexSeqMetrics
that are sampled at various levels of coverage, via random downsampling, during the construction of duplex metrics.Metrics produced by
CollectDuplexSeqMetrics
that are sampled at various levels of coverage, via random downsampling, during the construction of duplex metrics. The downsampling is done in such a way that thefraction
s are approximate, and not exact, therefore thefraction
field should only be interpreted as a guide and theread_pairs
field used to quantify how much data was used.See
FamilySizeMetric
for detailed definitions ofCS
,SS
andDS
as used below.- fraction
The approximate fraction of the full dataset that was used to generate the remaining values.
- read_pairs
The number of read pairs upon which the remaining metrics are based.
- cs_families
The number of _CS_ (Coordinate & Strand) families present in the data.
- ss_families
The number of _SS_ (Single-Strand by UMI) families present in the data.
- ds_families
The number of _DS_ (Double-Strand by UMI) families present in the data.
- ds_duplexes
The number of _DS_ families that had the minimum number of observations on both strands to be called duplexes (default = 1 read on each strand).
- ds_fraction_duplexes
The fraction of _DS_ families that are duplexes (
ds_duplexes / ds_families
).- ds_fraction_duplexes_ideal
The fraction of _DS_ families that should be duplexes under an idealized model where each strand,
A
andB
, have equal probability of being sampled, given the observed distribution of _DS_ family sizes.
-
case class
FamilySizeMetric(family_size: Int, cs_count: Count = 0, cs_fraction: Proportion = 0, cs_fraction_gt_or_eq_size: Proportion = 0, ss_count: Count = 0, ss_fraction: Proportion = 0, ss_fraction_gt_or_eq_size: Proportion = 0, ds_count: Count = 0, ds_fraction: Proportion = 0, ds_fraction_gt_or_eq_size: Proportion = 0) extends Metric with Product with Serializable
Metrics produced by
CollectDuplexSeqMetrics
to quantify the distribution of different kinds of read family sizes.Metrics produced by
CollectDuplexSeqMetrics
to quantify the distribution of different kinds of read family sizes. Three kinds of families are described:1. _CS_ or _Coordinate & Strand_: families of reads that are grouped together by their unclipped 5' genomic positions and strands just as they are in traditional PCR duplicate marking 2. _SS_ or _Single Strand_: single-strand families that are each subsets of a CS family create by also using the UMIs to partition the larger family, but not linking up families that are created from opposing strands of the same source molecule. 3. _DS_ or _Double Strand_: families that are created by combining single-strand families that are from opposite strands of the same source molecule. This does **not** imply that all DS families are composed of reads from both strands; where only one strand of a source molecule is observed a DS family is still counted.
- family_size
The family size, i.e. the number of read pairs grouped together into a family.
- cs_count
The count of families with
size == family_size
when grouping just by coordinates and strand information.- cs_fraction
The fraction of all _CS_ families where
size == family_size
.- cs_fraction_gt_or_eq_size
The fraction of all _CS_ families where
size >= family_size
.- ss_count
The count of families with
size == family_size
when also grouping by UMI to create single-strand families.- ss_fraction
The fraction of all _SS_ families where
size == family_size
.- ss_fraction_gt_or_eq_size
The fraction of all _SS_ families where
size >= family_size
.- ds_count
The count of families with
size == family_size
when also grouping by UMI and merging single-strand families from opposite strands of the same source molecule.- ds_fraction
The fraction of all _DS_ families where
size == family_size
.- ds_fraction_gt_or_eq_size
The fraction of all _DS_ families where
size >= family_size
.
-
case class
UmiMetric(umi: String, raw_observations: Count = 0, raw_observations_with_errors: Count = 0, unique_observations: Count = 0, fraction_raw_observations: Proportion = 0, fraction_unique_observations: Proportion = 0) extends Metric with Product with Serializable
Metrics produced by
CollectDuplexSeqMetrics
describing the set of observed UMI sequences and the frequency of their observations.Metrics produced by
CollectDuplexSeqMetrics
describing the set of observed UMI sequences and the frequency of their observations. The UMI sequences reported may have been corrected using information within a double-stranded tag family. For example if a tag family is comprised of three read pairs with UMIsACGT-TGGT
,ACGT-TGGT
, andACGT-TGGG
then a consensus UMI ofACGT-TGGT
will be generated, and three raw observations counted for each ofACGT
andTGGT
, and no observations counted forTGGG
.- umi
The UMI sequence, possibly-corrected.
- raw_observations
The number of read pairs in the input BAM that observe the UMI (after correction).
- raw_observations_with_errors
The subset of raw-observations that underwent any correction.
- unique_observations
The number of double-stranded tag families (i.e unique double-stranded molecules) that observed the UMI.
- fraction_raw_observations
The fraction of all raw observations that the UMI accounts for.
- fraction_unique_observations
The fraction of all unique observations that the UMI accounts for.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val DuplexFamilySizeMetricsExt: String
- val DuplexUmiMetricsExt: String
- val FamilySizeMetricsExt: String
- val PlotsExt: String
- val UmiMetricsExt: String
- val YieldMetricsExt: String
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()