Skip to content

[ML] Refactor inference request executor to leverage scheduled execution #126858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Apr 15, 2025

Fixes #126853

This PR refactors the RequestExecutorService to use ThreadPool.schedule instead of having a long lived thread that sleeps.

Testing

The inference_utility thread should no longer report an always active thread:

GET http://localhost:9200/_cat/thread_pool
runTask-0 analyze                                0 0 0
runTask-0 auto_complete                          0 0 0
runTask-0 azure_event_loop                       0 0 0
runTask-0 ccr                                    0 0 0
runTask-0 cluster_coordination                   0 0 0
runTask-0 downsample_indexing                    0 0 0
runTask-0 esql_worker                            0 0 0
runTask-0 fetch_shard_started                    0 0 0
runTask-0 fetch_shard_store                      0 0 0
runTask-0 flush                                  0 0 0
runTask-0 force_merge                            0 0 0
runTask-0 generic                                0 0 0
runTask-0 get                                    0 0 0
runTask-0 inference_utility                      0 0 0 <-----------
runTask-0 management                             1 0 0
runTask-0 merge                                  0 0 0
runTask-0 ml_datafeed                            0 0 0
runTask-0 ml_job_comms                           0 0 0
runTask-0 ml_native_inference_comms              0 0 0
runTask-0 ml_utility                             0 0 0
runTask-0 model_download                         0 0 0
runTask-0 profiling                              0 0 0
runTask-0 refresh                                0 0 0
runTask-0 repository_azure                       0 0 0
runTask-0 rollup_indexing                        0 0 0
runTask-0 search                                 0 0 0
runTask-0 search_coordination                    0 0 0
runTask-0 searchable_snapshots_cache_fetch_async 0 0 0
runTask-0 searchable_snapshots_cache_prewarming  0 0 0
runTask-0 security-crypto                        0 0 0
runTask-0 security-token-key                     0 0 0
runTask-0 snapshot                               0 0 0
runTask-0 snapshot_meta                          0 0 0
runTask-0 system_critical_read                   0 0 0
runTask-0 system_critical_write                  0 0 0
runTask-0 system_read                            0 0 0
runTask-0 system_write                           0 0 0
runTask-0 warmer                                 0 0 0
runTask-0 watcher                                0 0 0
runTask-0 write                                  0 0 0

Retrieving hot threads shouldn't show the utility thread all the time now

GET http://localhost:9200/_nodes/hot_threads?threads=9999

@jonathan-buttner jonathan-buttner added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels Apr 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jonathan-buttner, I've created a changelog YAML for you.

assertTrue(service.isTerminated());
}

public void testSleep_ThrowingInterruptedException_TerminatesService() throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're no longer using a "sleeper" so we don't need this test anymore.

while (isShutdown() == false) {
handleTasks();
}
} catch (InterruptedException e) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not sleeping and we're not using a long lived thread we don't need to catch the interrupted exception.

@jonathan-buttner jonathan-buttner marked this pull request as ready for review April 15, 2025 17:44
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@jonathan-buttner jonathan-buttner merged commit 7a0f63c into elastic:main Apr 16, 2025
17 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-refactor-request-exec branch April 16, 2025 18:14
jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Apr 16, 2025
…ion (elastic#126858)

* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.18 Commit could not be cherrypicked due to conflicts
8.x
9.0 Commit could not be cherrypicked due to conflicts
8.17 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126858

jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Apr 16, 2025
…ion (elastic#126858)

* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log

(cherry picked from commit 7a0f63c)

# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java
jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Apr 16, 2025
…ion (elastic#126858)

* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log

(cherry picked from commit 7a0f63c)

# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java
jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Apr 16, 2025
…ion (elastic#126858)

* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log

(cherry picked from commit 7a0f63c)

# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java
@jonathan-buttner
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.0
8.18
8.17

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Apr 16, 2025
…ion (#126858) (#126948)

* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log

(cherry picked from commit 7a0f63c)

# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java
elasticsearchmachine pushed a commit that referenced this pull request Apr 16, 2025
…ion (#126858) (#126950)

* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log

(cherry picked from commit 7a0f63c)

# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java
elasticsearchmachine pushed a commit that referenced this pull request Apr 17, 2025
…ion (#126858) (#126949)

* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log

(cherry picked from commit 7a0f63c)

# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/external/http/sender/HttpRequestSenderTests.java
elasticsearchmachine pushed a commit that referenced this pull request May 29, 2025
…ion (#126858) (#126946)

* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending >bug :ml Machine learning Team:ML Meta label for the ML team v8.17.6 v8.18.1 v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Inference plugin's utility threadpool executing when no tasks to run
3 participants