EQL: Clean any used memory by the sequence matcher and circuit breaker used bytes in case of exception #84451

astefan · 2022-02-28T19:07:48Z

In case of an exception (a partial shard failure for example) the circuit breaker used bytes tracker was not cleared. The same goes for some data structures used by the eql sequence algorithm.

elasticmachine · 2022-02-28T19:07:52Z

Pinging @elastic/es-ql (Team:QL)

Luegg

👍 to the fix itself but I left some comments concerning the test.

The relative complexity of the test also made me wondering wether it might be better written as an integration test using a search plugin similar to SearchBlockPlugin to emulate the search failure?

Luegg · 2022-03-01T12:32:17Z

...in/eql/src/test/java/org/elasticsearch/xpack/eql/execution/sequence/CircuitBreakerTests.java

+    }
+
+    private void assertMemoryCleared(int sequenceFiltersCount, PIT_ID_TYPE failureId) {
+        List<BreakerSettings> eqlBreakerSettings = Collections.singletonList(


Why do you need the original EQL circuit breaker here? Couldn't the test also use something like EqlTestCircuitBreaker that just accumulates the values? This would also allow to keep CIRCUIT_BREAKER_NAME etc private.

Using the test breaker doesn't reflect the original breaker behavior and these tests depend (not fully by checking the exact calculations it is doing) on that behavior.

Luegg · 2022-03-01T12:43:16Z

...in/eql/src/test/java/org/elasticsearch/xpack/eql/execution/sequence/CircuitBreakerTests.java

+        assertMemoryCleared(stages, PIT_ID_TYPE.FAILURE_ID);
+    }
+
+    private void assertMemoryCleared(int sequenceFiltersCount, PIT_ID_TYPE failureId) {


failureId is never used

Nicely spot.

Luegg · 2022-03-01T12:44:37Z

...in/eql/src/test/java/org/elasticsearch/xpack/eql/execution/sequence/CircuitBreakerTests.java

+                eqlBreakerSettings,
+                new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS)
+            );
+            PitMockClient esClient = new PitMockClient(getTestName(), service.getBreaker(CIRCUIT_BREAKER_NAME), PIT_ID_TYPE.FAILURE_ID)


Controlling the behavior of the client through the PIT ID seems surprising to me. Wouldn't it be more straightforward to use a flag or different instances?

That specific line you added the comment for should in fact be (as you spotted earlier that failureId is not used) PitMockClient esClient = new PitMockClient(getTestName(), service.getBreaker(CIRCUIT_BREAKER_NAME), failureId).

I found that the value of the pit ID is simple and flexible enough to allow me to run two different tests (success and failure) and, also, to control the behavior of the client response in multiple requests for one of the pid ID types.

Can you detail and/or suggest more specific code changes to use a flag or different (client, I assume) instances?

What I'm mostly thinking of is to separate the various concerns packed into the PIT ID right now:

What type of client is used by the test (failing vs. succeeding)

Keeping track of the client state (how many requests have been answered)

Being the PIT ID itself

Instantiating PitMockClient with a succeeding flag (or using two different implementations) can address the first concern. I'm not entirely sure about the client state but that can probably be addressed by a field in the client. This would also resolve the issue mentioned in https://github.com/elastic/elasticsearch/pull/84451/files#r816743228.

Luegg · 2022-03-01T12:55:56Z

...in/eql/src/test/java/org/elasticsearch/xpack/eql/execution/sequence/CircuitBreakerTests.java

+                    ShardSearchFailure.EMPTY_ARRAY,
+                    SearchResponse.Clusters.EMPTY,
+                    // copy the pit from the request and "increment" its "+" sign
+                    requestPitId.equals(PIT_ID_TYPE.FAILURE_ID.pitId) ? requestPitId + "+" : null


I guess this works because EQL pipes the PIT id from the previous response into the next request. That's not a requirement of the PIT APIs though. A PIT user might also use the PIT id returned from the initial OpenPointInTimeRequest for all subsequent search requests. Not that this invalidates the test but it's imposing a certain use of the PIT APIs on the implementation.

Couldn't the client internally keep track of the number of requests instead?

Luegg · 2022-03-01T12:57:56Z

...in/eql/src/test/java/org/elasticsearch/xpack/eql/execution/sequence/CircuitBreakerTests.java

+                pitContextCounter.incrementAndGet();
+                OpenPointInTimeResponse response = new OpenPointInTimeResponse(pitId);
+                listener.onResponse((Response) response);
+                return;


The multiple return statements can also be avoided by moving super.doExecute(action, request, listener); below into an else branch. This would also reduce the risk of accidentally calling the listener twice.

costin

Left a comment around centralizing the call. Please add some comments to the test on what it tries to do and moreover how it goes about it and its setup.

costin · 2022-03-01T21:20:40Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/sequence/TumblingWindow.java

        log.trace("Starting sequence window w/ fetch size [{}]", windowSize);
        startTime = System.currentTimeMillis();
-        tumbleWindow(0, listener);
+        tumbleWindow(0, wrap(listener::onResponse, e -> {


I prefer the close method to be called in one central place - right now it's already called in two places and this fixes adds a third one for the exception branch.
There should be only one call, even removing the method itself to avoid having it reused.
Something along the lines of:

tumbleWindow(0, runAfter(listener, close(listener));

This should guarantee that after the listener executes, the clean-up is done both on onResponse and onFailure. It's worth checking whether onFailure can be called multiple times and if not, change close() method to not forward the exception.
Also I would remove close(listener) method and simply inline the code in the lambda above to avoid calling it from different places since that shouldn't be the case.

astefan · 2022-03-07T18:11:06Z

@elasticmachine update branch

costin

LGTM

bpintea

nice testing, lgtm.

bpintea · 2022-03-08T11:40:52Z

...in/eql/src/test/java/org/elasticsearch/xpack/eql/execution/sequence/CircuitBreakerTests.java

+                listener.onResponse((Response) response);
+            } else if (request instanceof ClosePointInTimeRequest) {
+                ClosePointInTimeResponse response = new ClosePointInTimeResponse(true, 1);
+                assert pitContextCounter.get() > 0;


did you choose the assert (here and below) intentionally?

I thought of it as a stronger statement about the pit opening/closing.

…r used bytes in case of exception (#84451) (cherry picked from commit 4c243d6)

…r used bytes in case of exception (#84451) (#84804) (cherry picked from commit 4c243d6)

…r used bytes in case of exception (#84451) (#84803) (cherry picked from commit 4c243d6)

…r used bytes in case of exception (elastic#84451) (cherry picked from commit 4c243d6)

…r used bytes in case of exception (#84451) (#84821) (cherry picked from commit 4c243d6)

Clean any used memory by the sequence matcher in case of exception

eac5733

astefan added >bug :Analytics/EQL EQL querying v8.2.0 v8.1.1 v8.0.2 labels Feb 28, 2022

astefan requested review from costin, luigidellaquila, Luegg and bpintea February 28, 2022 19:07

elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Feb 28, 2022

Update docs/changelog/84451.yaml

870cead

Luegg reviewed Mar 1, 2022

View reviewed changes

costin reviewed Mar 1, 2022

View reviewed changes

Address reviews

479c332

elasticmachine and others added 2 commits March 7, 2022 12:11

Merge branch 'master' into clear_circuit_break_mem

820ef2f

Simplify

cd1a390

astefan requested review from costin and Luegg March 8, 2022 07:17

costin approved these changes Mar 8, 2022

View reviewed changes

bpintea approved these changes Mar 8, 2022

View reviewed changes

astefan merged commit 4c243d6 into elastic:master Mar 9, 2022

astefan deleted the clear_circuit_break_mem branch March 9, 2022 12:43

astefan added the auto-backport Automatically create backport pull requests when merged label Mar 9, 2022

astefan added a commit that referenced this pull request Mar 9, 2022

EQL: Clean any used memory by the sequence matcher and circuit breake…

2f73731

…r used bytes in case of exception (#84451) (cherry picked from commit 4c243d6)

astefan added a commit that referenced this pull request Mar 9, 2022

EQL: Clean any used memory by the sequence matcher and circuit breake…

8183428

…r used bytes in case of exception (#84451) (cherry picked from commit 4c243d6)

astefan added a commit that referenced this pull request Mar 9, 2022

EQL: Clean any used memory by the sequence matcher and circuit breake…

a22ca88

…r used bytes in case of exception (#84451) (#84804) (cherry picked from commit 4c243d6)

astefan added a commit that referenced this pull request Mar 9, 2022

EQL: Clean any used memory by the sequence matcher and circuit breake…

2b56626

…r used bytes in case of exception (#84451) (#84803) (cherry picked from commit 4c243d6)

astefan added v7.17.2 and removed auto-backport Automatically create backport pull requests when merged labels Mar 9, 2022

astefan added a commit to astefan/elasticsearch that referenced this pull request Mar 9, 2022

EQL: Clean any used memory by the sequence matcher and circuit breake…

7f45772

…r used bytes in case of exception (elastic#84451) (cherry picked from commit 4c243d6)

astefan added a commit that referenced this pull request Mar 10, 2022

EQL: Clean any used memory by the sequence matcher and circuit breake…

e7fa335

…r used bytes in case of exception (#84451) (#84821) (cherry picked from commit 4c243d6)

EQL: Clean any used memory by the sequence matcher and circuit breaker used bytes in case of exception #84451

EQL: Clean any used memory by the sequence matcher and circuit breaker used bytes in case of exception #84451

Uh oh!

Conversation

astefan commented Feb 28, 2022

Uh oh!

elasticmachine commented Feb 28, 2022

Uh oh!

Luegg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

costin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astefan commented Mar 7, 2022

Uh oh!

costin left a comment

Choose a reason for hiding this comment

Uh oh!

bpintea left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!