-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Trigger merges after recovery #113102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger merges after recovery #113102
Conversation
We may have shut a shard down while merges were still pending (or adjusted the merge policy while the shard was down) meaning that after recovery its segments do not reflect the desired state according to the merge policy. With this commit we invoke `IndexWriter#maybeMerge()` at the end of recovery to check for, and execute, any such lost merges.
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @DaveCTurner, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left two comments.
return recoveryListener; | ||
} | ||
|
||
logger.trace(Strings.format("wrapping listener for post-recovery merge of [%s]", shardId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.trace(Strings.format("wrapping listener for post-recovery merge of [%s]", shardId)); | |
logger.trace(() -> Strings.format("wrapping listener for post-recovery merge of [%s]", shardId)); |
(same remark for other log traces)
server/src/main/java/org/elasticsearch/indices/IndicesService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one question on how to ensure a flush happens.
|
||
ensureGreen(indexName); | ||
assertBusy(() -> { | ||
refresh(indexName); // pick up the result of any merges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need to let afterMerge
set active to true to ensure that we get a flush after a merge always? I think otherwise we risk a flush
happening too early, before the merge completes (especially if the processing is lagging due to running on one thread) and then no more flush after the merge.
And if we do that, we should probably change this to call flushOnIdle
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure that we get a flush after a merge always?
That sounds sensible to me but I'm not sure if it could have any bad consequences. Why don't we do it already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we already flush just before snapshotting, and before relocating the shard in stateless ES. Not saying that we shouldn't try and get this to flush in flushOnIdle
too, but maybe we can think about that in a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that helps. I also now realize that the check here:
elasticsearch/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
Line 2849 in a59c182
&& System.nanoTime() - lastWriteNanos >= engineConfig.getFlushMergesAfter().nanos()) { |
is more than likely to kick in if the active->inactive flush already occurred, making this more of a benign race condition problem than something that would repeatedly happen.
I wonder if it was better to set indices.memory.shard_inactive_time=0
in this test and then avoid the refresh here, demonstrating that it will indeed flush after the merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure yes that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about the throttling behaviour. I'd appreciate if you could help me understand it better. Thanks!
return; | ||
} | ||
|
||
indexShard.triggerPendingMerges(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ultimately calls IndexWriter#maybeMerge
which kicks off actual merges with "Lucene Merge Thread". It does not wait for the merges to complete. So my understanding is that we can kick off merges for multiple shards even with the throttled task runner of a single thread? Did I miss something, or maybe this is intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's intended, see comment on ES-9313. I copied this info into a code comment in e70887c.
@@ -1514,6 +1514,13 @@ public void forceMerge(ForceMergeRequest forceMerge) throws IOException { | |||
engine.forceMerge(forceMerge.flush(), forceMerge.maxNumSegments(), forceMerge.onlyExpungeDeletes(), forceMerge.forceMergeUUID()); | |||
} | |||
|
|||
public void triggerPendingMerges() throws IOException { | |||
switch (state /* single volatile read */) { | |||
case STARTED, POST_RECOVERY -> getEngine().forceMerge(false, ForceMergeRequest.Defaults.MAX_NUM_SEGMENTS, false, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we add a comment about passing flush=false
and null
for the uuid? My understanding is that they do not make sense when the downstream code calls IndexWriter#maybeMerge
. But it is not obvious from here.
Also for my knowledge: Since we don't flush actively, I guess it relies on flushes triggered by other mechanisms, such as scheduled refresh and indexing disk and memory controllers and maybe other things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments in c07131b.
Henning and I are still discussing how to ensure we flush the result of the merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
...erTest/java/org/elasticsearch/cluster/routing/allocation/decider/DiskThresholdDeciderIT.java
Show resolved
Hide resolved
@@ -1600,7 +1600,7 @@ protected final BroadcastResponse flush(String... indices) { | |||
*/ | |||
protected BroadcastResponse forceMerge() { | |||
waitForRelocation(); | |||
BroadcastResponse actionGet = indicesAdmin().prepareForceMerge().setMaxNumSegments(1).get(); | |||
BroadcastResponse actionGet = indicesAdmin().prepareForceMerge().setMaxNumSegments(1).setFlush(true).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is the default anyway, not sure why this change is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry that was leftover from debugging the DiskThresholdDeciderIT
failure, not necessary indeed. I'll remove it.
We may have shut a shard down while merges were still pending (or
adjusted the merge policy while the shard was down) meaning that after
recovery its segments do not reflect the desired state according to the
merge policy. With this commit we invoke
IndexWriter#maybeMerge()
atthe end of recovery to check for, and execute, any such lost merges.