-
Notifications
You must be signed in to change notification settings - Fork 25.5k
EQL: Clean any used memory by the sequence matcher and circuit breaker used bytes in case of exception #84451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-ql (Team:QL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to the fix itself but I left some comments concerning the test.
The relative complexity of the test also made me wondering wether it might be better written as an integration test using a search plugin similar to SearchBlockPlugin
to emulate the search failure?
} | ||
|
||
private void assertMemoryCleared(int sequenceFiltersCount, PIT_ID_TYPE failureId) { | ||
List<BreakerSettings> eqlBreakerSettings = Collections.singletonList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need the original EQL circuit breaker here? Couldn't the test also use something like EqlTestCircuitBreaker
that just accumulates the values? This would also allow to keep CIRCUIT_BREAKER_NAME
etc private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the test breaker doesn't reflect the original breaker behavior and these tests depend (not fully by checking the exact calculations it is doing) on that behavior.
assertMemoryCleared(stages, PIT_ID_TYPE.FAILURE_ID); | ||
} | ||
|
||
private void assertMemoryCleared(int sequenceFiltersCount, PIT_ID_TYPE failureId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failureId
is never used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely spot.
eqlBreakerSettings, | ||
new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS) | ||
); | ||
PitMockClient esClient = new PitMockClient(getTestName(), service.getBreaker(CIRCUIT_BREAKER_NAME), PIT_ID_TYPE.FAILURE_ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controlling the behavior of the client through the PIT ID seems surprising to me. Wouldn't it be more straightforward to use a flag or different instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That specific line you added the comment for should in fact be (as you spotted earlier that failureId
is not used) PitMockClient esClient = new PitMockClient(getTestName(), service.getBreaker(CIRCUIT_BREAKER_NAME), failureId)
.
I found that the value of the pit ID is simple and flexible enough to allow me to run two different tests (success and failure) and, also, to control the behavior of the client response in multiple requests for one of the pid ID types.
Can you detail and/or suggest more specific code changes to use a flag or different (client, I assume) instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm mostly thinking of is to separate the various concerns packed into the PIT ID right now:
- What type of client is used by the test (failing vs. succeeding)
- Keeping track of the client state (how many requests have been answered)
- Being the PIT ID itself
Instantiating PitMockClient
with a succeeding
flag (or using two different implementations) can address the first concern. I'm not entirely sure about the client state but that can probably be addressed by a field in the client. This would also resolve the issue mentioned in https://github.com/elastic/elasticsearch/pull/84451/files#r816743228.
ShardSearchFailure.EMPTY_ARRAY, | ||
SearchResponse.Clusters.EMPTY, | ||
// copy the pit from the request and "increment" its "+" sign | ||
requestPitId.equals(PIT_ID_TYPE.FAILURE_ID.pitId) ? requestPitId + "+" : null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this works because EQL pipes the PIT id from the previous response into the next request. That's not a requirement of the PIT APIs though. A PIT user might also use the PIT id returned from the initial OpenPointInTimeRequest
for all subsequent search requests. Not that this invalidates the test but it's imposing a certain use of the PIT APIs on the implementation.
Couldn't the client internally keep track of the number of requests instead?
pitContextCounter.incrementAndGet(); | ||
OpenPointInTimeResponse response = new OpenPointInTimeResponse(pitId); | ||
listener.onResponse((Response) response); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiple return
statements can also be avoided by moving super.doExecute(action, request, listener);
below into an else
branch. This would also reduce the risk of accidentally calling the listener twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment around centralizing the call. Please add some comments to the test on what it tries to do and moreover how it goes about it and its setup.
log.trace("Starting sequence window w/ fetch size [{}]", windowSize); | ||
startTime = System.currentTimeMillis(); | ||
tumbleWindow(0, listener); | ||
tumbleWindow(0, wrap(listener::onResponse, e -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the close method to be called in one central place - right now it's already called in two places and this fixes adds a third one for the exception branch.
There should be only one call, even removing the method itself to avoid having it reused.
Something along the lines of:
tumbleWindow(0, runAfter(listener, close(listener));
This should guarantee that after the listener executes, the clean-up is done both on onResponse and onFailure. It's worth checking whether onFailure can be called multiple times and if not, change close() method to not forward the exception.
Also I would remove close(listener) method and simply inline the code in the lambda above to avoid calling it from different places since that shouldn't be the case.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice testing, lgtm.
listener.onResponse((Response) response); | ||
} else if (request instanceof ClosePointInTimeRequest) { | ||
ClosePointInTimeResponse response = new ClosePointInTimeResponse(true, 1); | ||
assert pitContextCounter.get() > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you choose the assert (here and below) intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of it as a stronger statement about the pit opening/closing.
…r used bytes in case of exception (elastic#84451) (cherry picked from commit 4c243d6)
In case of an exception (a partial shard failure for example) the circuit breaker used bytes tracker was not cleared. The same goes for some data structures used by the eql sequence algorithm.