Merges select segments of approximately equal size, subject to an allowed number of segments per tier. The merge policy is able to merge non-adjacent segments, and separates how many segments are merged at once from how many segments are allowed per tier. It also does not over-merge (i.e., cascade merges).
All merge policy settings are dynamic and can be updated on a live index. The merge policy has the following settings:
index.merge.policy.expunge_deletes_allowed
: When forceMergeDeletes is called, we only merge away a segment if its delete percentage is over this threshold. Default is10
.index.merge.policy.floor_segment
: Segments smaller than this are "rounded up" to this size, i.e. treated as equal (floor) size for merge selection. This is to prevent frequent flushing of tiny segments, thus preventing a long tail in the index. Default is2mb
.index.merge.policy.max_merge_at_once
: Maximum number of segments to be merged at a time during "normal" merging. Default is10
.index.merge.policy.max_merged_segment
: Maximum sized segment to produce during normal merging (not explicit force merge). This setting is approximate: the estimate of the merged segment size is made by summing sizes of to-be-merged segments (compensating for percent deleted docs). Default is5gb
.index.merge.policy.segments_per_tier
: Sets the allowed number of segments per tier. Smaller values mean more merging but fewer segments. Default is10
. Note, this value needs to be >= than themax_merge_at_once
otherwise you'll force too many merges to occur.index.merge.policy.deletes_pct_allowed
: Controls the maximum percentage of deleted documents that is tolerated in the index. Lower values make the index more space efficient at the expense of increased CPU and I/O activity. Values must be between5
and50
. Default value is20
.
For normal merging, the policy first computes a "budget" of how many segments are allowed to be in the index. If the index is over-budget, then the policy sorts segments by decreasing size (proportionally considering percent deletes), and then finds the least-cost merge. Merge cost is measured by a combination of the "skew" of the merge (size of largest seg divided by smallest seg), total merge size and pct deletes reclaimed, so that merges with lower skew, smaller size and those reclaiming more deletes, are favored.
If a merge will produce a segment that's larger than
max_merged_segment
then the policy will merge fewer segments (down to
1 at once, if that one has deletions) to keep the segment size under
budget.
Note, this can mean that for large shards that holds many gigabytes of
data, the default of max_merged_segment
(5gb
) can cause for many
segments to be in an index, and causing searches to be slower. Use the
indices segments API to see the segments that an index has, and
possibly either increase the max_merged_segment
or issue an optimize
call for the index (try and aim to issue it on a low traffic time).
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
static enum
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final double
static final double
static final ByteSizeValue
static final int
static final ByteSizeValue
static final ByteSizeValue
Time-based data generally gets rolled over, so there is not much value in enforcing a maximum segment size, which has the side effect of merging fewer segments together than the merge factor, which in-turn increases write amplification.static final int
A default value forLogByteSizeMergePolicy
's merge factor: 32.static final double
static final Setting<MergePolicyConfig.CompoundFileThreshold>
static final String
static final Setting<ByteSizeValue>
static final Setting<ByteSizeValue>
static final Setting<MergePolicyConfig.Type>
-
Method Summary
-
Field Details
-
DEFAULT_EXPUNGE_DELETES_ALLOWED
public static final double DEFAULT_EXPUNGE_DELETES_ALLOWED- See Also:
-
DEFAULT_FLOOR_SEGMENT
-
DEFAULT_MAX_MERGE_AT_ONCE
public static final int DEFAULT_MAX_MERGE_AT_ONCE- See Also:
-
DEFAULT_MAX_MERGED_SEGMENT
-
DEFAULT_MAX_TIME_BASED_MERGED_SEGMENT
Time-based data generally gets rolled over, so there is not much value in enforcing a maximum segment size, which has the side effect of merging fewer segments together than the merge factor, which in-turn increases write amplification. So we set an arbitrarily high roof that serves as a protection that we expect to never hit. -
DEFAULT_SEGMENTS_PER_TIER
public static final double DEFAULT_SEGMENTS_PER_TIER- See Also:
-
DEFAULT_MERGE_FACTOR
public static final int DEFAULT_MERGE_FACTORA default value forLogByteSizeMergePolicy
's merge factor: 32. This default value differs from the Lucene default of 10 in order to account for the fact that Elasticsearch usesLogByteSizeMergePolicy
for time-based data, where adjacent segment merging ensures that segments have mostly non-overlapping time ranges if data gets ingested in timestamp order. In turn, this allows range queries on the timestamp to remain efficient with high numbers of segments since most segments either don't match the query range or are fully contained by the query range.- See Also:
-
DEFAULT_DELETES_PCT_ALLOWED
public static final double DEFAULT_DELETES_PCT_ALLOWED- See Also:
-
INDEX_COMPOUND_FORMAT_SETTING
-
INDEX_MERGE_POLICY_TYPE_SETTING
-
INDEX_MERGE_POLICY_EXPUNGE_DELETES_ALLOWED_SETTING
-
INDEX_MERGE_POLICY_FLOOR_SEGMENT_SETTING
-
INDEX_MERGE_POLICY_MAX_MERGE_AT_ONCE_SETTING
-
INDEX_MERGE_POLICY_MAX_MERGE_AT_ONCE_EXPLICIT_SETTING
-
INDEX_MERGE_POLICY_MAX_MERGED_SEGMENT_SETTING
-
INDEX_MERGE_POLICY_SEGMENTS_PER_TIER_SETTING
-
INDEX_MERGE_POLICY_MERGE_FACTOR_SETTING
-
INDEX_MERGE_POLICY_DELETES_PCT_ALLOWED_SETTING
-
INDEX_MERGE_ENABLED
- See Also:
-