the prefix to apply to all consensus read names
the read group ID to apply to all created consensus reads
the minimum input base quality score to use a raw read's base
if true, quality trim reads in addition to masking. If false just mask.
the estimated rate of errors in the DNA prior to attaching UMIs
the estimated rate of errors in the DNA post attaching UMIs
the minimum number of input reads to a consensus read (see CallDuplexConsensusReads).
Returns the number of consensus reads constructed by this caller.
Returns the number of consensus reads constructed by this caller.
Takes in all the reads for a source molecule and, if possible, generates one or more output consensus reads as SAM records.
Takes in all the reads for a source molecule and, if possible, generates one or more output consensus reads as SAM records.
the full set of source SamRecords for a source molecule
a seq of consensus SAM records, may be empty
Takes in all the reads for a source molecule and, if possible, generates one or more output consensus reads as SAM records.
Takes in all the reads for a source molecule and, if possible, generates one or more output consensus reads as SAM records.
the full set of source SamRecords for a source molecule
a seq of consensus SAM records, may be empty
Creates a SamRecord with a ton of additional tags annotating the duplex read.
Creates a SamRecord with a ton of additional tags annotating the duplex read.
the estimated rate of errors in the DNA post attaching UMIs
the estimated rate of errors in the DNA prior to attaching UMIs
Takes in a non-empty seq of SamRecords and filters them such that the returned seq only contains those reads that share the most common alignment of the read sequence to the reference.
Takes in a non-empty seq of SamRecords and filters them such that the returned seq only contains those reads that share the most common alignment of the read sequence to the reference. If two or more different alignments share equal numbers of reads, the 'most common' will be an arbitrary pick amongst those alignments, and the group of reads with that alignment will be returned.
For the purposes of this method all that is implied by "same alignment" is that any insertions or deletions are at the same position and of the same length. This is done to allow for differential read length (either due to sequencing or untracked hard-clipping of adapters) and for differential soft-clipping at the starts and ends of reads.
NOTE: filtered out reads are sent to the rejectRecords method and do not need further handling
Logs statistics about how many reads were seen, and how many were filtered/discarded due to various filters.
Logs statistics about how many reads were seen, and how many were filtered/discarded due to various filters.
the minimum input base quality score to use a raw read's base
the minimum number of input reads to a consensus read (see CallDuplexConsensusReads).
the read group ID to apply to all created consensus reads
the read group ID to apply to all created consensus reads
the prefix to apply to all consensus read names
the prefix to apply to all consensus read names
Returns the number of raw reads filtered out due to there being insufficient reads present to build the necessary set of consensus reads.
Returns the number of raw reads filtered out due to there being insufficient reads present to build the necessary set of consensus reads.
Returns the number of raw reads filtered out because their alignment disagreed with the majority alignment of all raw reads for the same source molecule.
Returns the number of raw reads filtered out because their alignment disagreed with the majority alignment of all raw reads for the same source molecule.
Records that the supplied records were rejected, and not used to build a consensus read.
Records that the supplied records were rejected, and not used to build a consensus read.
Returns the MI tag minus the trailing suffix that identifies /A vs /B
Returns the MI tag minus the trailing suffix that identifies /A vs /B
a SamRecord
an identified for the source molecule
Split records into those that should make a single-end consensus read, first of pair consensus read, and second of pair consensus read, respectively.
Split records into those that should make a single-end consensus read, first of pair consensus read, and second of pair consensus read, respectively. The default method is to use the SAM flag to find unpaired reads, first of pair reads, and second of pair reads.
Sums a short array into an Int to avoid overflow.
Sums a short array into an Int to avoid overflow.
Converts from a SamRecord into a SourceRead.
Converts from a SamRecord into a SourceRead. During conversion the record is end-trimmed
to remove Ns and bases below the minBaseQuality
. Remaining bases that are below
minBaseQuality
are then masked to Ns.
Some(SourceRead) if there are any called bases with quality > minBaseQuality, else None
Returns the total number of reads filtered for any reason.
Returns the total number of reads filtered for any reason.
Returns the total number of input reads examined by the consensus caller so far.
Returns the total number of input reads examined by the consensus caller so far.
if true, quality trim reads in addition to masking.
if true, quality trim reads in addition to masking. If false just mask.
Creates duplex consensus reads from SamRecords that have been grouped by their source molecule but not yet by source strand.
Filters incoming bases by quality before building the duplex.
Output reads and bases are constructed only if there is at least one read from each source molecule strand. Otherwise no filtering is performed.
Note that a consequence of the above is that the output reads can be shorter than _some_ of the input reads if the input reads are of varying length; they will be the length at which there is coverage from both source strands.