Class TooManyDeletesCompactionStrategy


  • public class TooManyDeletesCompactionStrategy
    extends DefaultCompactionStrategy
    This compaction strategy works in concert with the DeletesSummarizer. Using the statistics from DeleteSummarizer this strategy will compact all files in a table when the number of deletes/non-deletes exceeds a threshold.

    This strategy has two options. First the "threshold" option allows setting the point at which a compaction will be triggered. This options defaults to ".25" and must be in the range (0.0, 1.0]. The second option is "proceed_zero_no_summary" which determines if the strategy should proceed when a bulk imported file has no summary information.

    If the delete summarizer was configured on a table that already had files, then those files will have not summary information. This strategy can still proceed in this situation. It will fall back to using Accumulo's estimated entries per file in this case. For the files without summary information the estimated number of deletes will be zero. This fall back method will underestimate deletes which will not lead to false positives, except for the case of bulk imported files. Accumulo estimates that bulk imported files have zero entires. The second option "proceed_zero_no_summary" determines if this strategy should proceed when it sees bulk imported files that do not have summary data. This option defaults to "false".

    Bulk files can be generated with summary information by calling AccumuloFileOutputFormat#setSummarizers(JobConf, SummarizerConfiguration...) or RFile.WriterOptions.withSummarizers(SummarizerConfiguration...)

    When this strategy does not decide to compact based on the number of deletes, then it will defer the decision to the DefaultCompactionStrategy.

    Configuring this compaction strategy for a table will cause it to always queue compactions, even though it may not decide to compact. These queued compactions may show up on the Accumulo monitor page. This is because summary data can not be read until after compaction is queued and dequeued. When the compaction is dequeued it can then decide not to compact. See ACCUMULO-4573

    Since:
    2.0.0