[ML] Refactor inference request executor to leverage scheduled execution #126858

jonathan-buttner · 2025-04-15T16:19:06Z

This PR refactors the RequestExecutorService to use ThreadPool.schedule instead of having a long lived thread that sleeps.

Testing

The inference_utility thread should no longer report an always active thread:

GET http://localhost:9200/_cat/thread_pool

runTask-0 analyze                                0 0 0
runTask-0 auto_complete                          0 0 0
runTask-0 azure_event_loop                       0 0 0
runTask-0 ccr                                    0 0 0
runTask-0 cluster_coordination                   0 0 0
runTask-0 downsample_indexing                    0 0 0
runTask-0 esql_worker                            0 0 0
runTask-0 fetch_shard_started                    0 0 0
runTask-0 fetch_shard_store                      0 0 0
runTask-0 flush                                  0 0 0
runTask-0 force_merge                            0 0 0
runTask-0 generic                                0 0 0
runTask-0 get                                    0 0 0
runTask-0 inference_utility                      0 0 0 <-----------
runTask-0 management                             1 0 0
runTask-0 merge                                  0 0 0
runTask-0 ml_datafeed                            0 0 0
runTask-0 ml_job_comms                           0 0 0
runTask-0 ml_native_inference_comms              0 0 0
runTask-0 ml_utility                             0 0 0
runTask-0 model_download                         0 0 0
runTask-0 profiling                              0 0 0
runTask-0 refresh                                0 0 0
runTask-0 repository_azure                       0 0 0
runTask-0 rollup_indexing                        0 0 0
runTask-0 search                                 0 0 0
runTask-0 search_coordination                    0 0 0
runTask-0 searchable_snapshots_cache_fetch_async 0 0 0
runTask-0 searchable_snapshots_cache_prewarming  0 0 0
runTask-0 security-crypto                        0 0 0
runTask-0 security-token-key                     0 0 0
runTask-0 snapshot                               0 0 0
runTask-0 snapshot_meta                          0 0 0
runTask-0 system_critical_read                   0 0 0
runTask-0 system_critical_write                  0 0 0
runTask-0 system_read                            0 0 0
runTask-0 system_write                           0 0 0
runTask-0 warmer                                 0 0 0
runTask-0 watcher                                0 0 0
runTask-0 write                                  0 0 0

Retrieving hot threads shouldn't show the utility thread all the time now

GET http://localhost:9200/_nodes/hot_threads?threads=9999

elasticsearchmachine · 2025-04-15T16:19:31Z

Hi @jonathan-buttner, I've created a changelog YAML for you.

…r/elasticsearch into ml-refactor-request-exec

jonathan-buttner · 2025-04-15T16:23:19Z

...java/org/elasticsearch/xpack/inference/external/http/sender/RequestExecutorServiceTests.java

-        assertTrue(service.isTerminated());
-    }
-
-    public void testSleep_ThrowingInterruptedException_TerminatesService() throws Exception {


We're no longer using a "sleeper" so we don't need this test anymore.

jonathan-buttner · 2025-04-15T16:23:59Z

...main/java/org/elasticsearch/xpack/inference/external/http/sender/RequestExecutorService.java

-            while (isShutdown() == false) {
-                handleTasks();
-            }
-        } catch (InterruptedException e) {


Since we're not sleeping and we're not using a long lived thread we don't need to catch the interrupted exception.

elasticsearchmachine · 2025-04-15T17:44:39Z

Pinging @elastic/ml-core (Team:ML)

…ion (elastic#126858) * Using threadpool schedule and fixing tests * Update docs/changelog/126858.yaml * Clean up * change log

elasticsearchmachine · 2025-04-16T18:15:45Z

💔 Backport failed

Status	Branch	Result
❌	8.18	Commit could not be cherrypicked due to conflicts
✅	8.x
❌	9.0	Commit could not be cherrypicked due to conflicts
❌	8.17	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126858

…ion (elastic#126858) * Using threadpool schedule and fixing tests * Update docs/changelog/126858.yaml * Clean up * change log (cherry picked from commit 7a0f63c) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java

jonathan-buttner · 2025-04-16T18:56:07Z

💚 All backports created successfully

Status	Branch	Result
✅	9.0
✅	8.18
✅	8.17

Questions ?

Please refer to the Backport tool documentation

…ion (#126858) (#126948) * Using threadpool schedule and fixing tests * Update docs/changelog/126858.yaml * Clean up * change log (cherry picked from commit 7a0f63c) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java

…ion (#126858) (#126950) * Using threadpool schedule and fixing tests * Update docs/changelog/126858.yaml * Clean up * change log (cherry picked from commit 7a0f63c) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java

…ion (#126858) (#126949) * Using threadpool schedule and fixing tests * Update docs/changelog/126858.yaml * Clean up * change log (cherry picked from commit 7a0f63c) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java

…ion (#126858) (#126946) * Using threadpool schedule and fixing tests * Update docs/changelog/126858.yaml * Clean up * change log

Using threadpool schedule and fixing tests

9e4c33b

jonathan-buttner added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels Apr 15, 2025

Update docs/changelog/126858.yaml

754ee20

jonathan-buttner added 3 commits April 15, 2025 12:20

Clean up

d99f756

Merge branch 'ml-refactor-request-exec' of github.com:jonathan-buttne…

e7e665e

…r/elasticsearch into ml-refactor-request-exec

change log

60f1660

jonathan-buttner commented Apr 15, 2025

View reviewed changes

jonathan-buttner added v8.18.1 v9.0.1 v8.17.6 labels Apr 15, 2025

jonathan-buttner marked this pull request as ready for review April 15, 2025 17:44

dan-rubinstein approved these changes Apr 16, 2025

View reviewed changes

jonathan-buttner merged commit 7a0f63c into elastic:main Apr 16, 2025
17 checks passed

jonathan-buttner deleted the ml-refactor-request-exec branch April 16, 2025 18:14

jonathan-buttner mentioned this pull request Apr 16, 2025

[8.x] [ML] Refactor inference request executor to leverage scheduled execution (#126858) #126946

Merged

elasticsearchmachine added the backport pending label Apr 16, 2025

jonathan-buttner mentioned this pull request Apr 16, 2025

[9.0] [ML] Refactor inference request executor to leverage scheduled execution (#126858) #126948

Merged

jonathan-buttner mentioned this pull request Apr 16, 2025

[8.18] [ML] Refactor inference request executor to leverage scheduled execution (#126858) #126949

Merged

jonathan-buttner mentioned this pull request Apr 16, 2025

[8.17] [ML] Refactor inference request executor to leverage scheduled execution (#126858) #126950

Merged

prwhelan mentioned this pull request Apr 17, 2025

Bedrock sender sleeping in inference_utility thread pool #115079

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Refactor inference request executor to leverage scheduled execution #126858

[ML] Refactor inference request executor to leverage scheduled execution #126858

Uh oh!

jonathan-buttner commented Apr 15, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Apr 15, 2025

Uh oh!

jonathan-buttner Apr 15, 2025

Uh oh!

jonathan-buttner Apr 15, 2025

Uh oh!

elasticsearchmachine commented Apr 15, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 16, 2025

Uh oh!

jonathan-buttner commented Apr 16, 2025

Uh oh!

Uh oh!

[ML] Refactor inference request executor to leverage scheduled execution #126858

[ML] Refactor inference request executor to leverage scheduled execution #126858

Uh oh!

Conversation

jonathan-buttner commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

elasticsearchmachine commented Apr 15, 2025

Uh oh!

jonathan-buttner Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Apr 15, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 16, 2025

💔 Backport failed

Uh oh!

jonathan-buttner commented Apr 16, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Uh oh!

jonathan-buttner commented Apr 15, 2025 •

edited

Loading