Fix get all inference endponts not returning multiple endpoints sharing model deployment #121821

dan-rubinstein · 2025-02-05T21:07:09Z

Description

Issue - https://github.com/elastic/ml-team/issues/1470?reload=1?reload=1

We currently have a bug when calling the get all inference endpoints API that only returns a single endpoint for each model deployment. This is happening because after we retrieve all the endpoints from the inference index, we call the deployment stats API to accurately return the current num_allocations but in doing so, we accidentally filter out all but the last retrieve inference endpoint for each model deployment. This change updates the logic to properly handle multiple endpoints for a single model deployment.

Testing

Unit tests
Locally tested by creating multiple endpoints for a single model deployment and ensuring that both endpoints are returned. Also confirmed that updating a single endpoints num_allocations will reflect in all endpoints sharing the same model deployment when calling to get all inference endpoints.

…ng model deployment

elasticsearchmachine · 2025-02-05T21:07:34Z

Hi @dan-rubinstein, I've created a changelog YAML for you.

elasticsearchmachine · 2025-02-06T15:46:59Z

Pinging @elastic/ml-core (Team:ML)

jonathan-buttner · 2025-02-06T17:42:39Z

Hey Dan 👋 do you want this change to go to 8.18.0? I see it's labeled for 9.0.0. 8.18 and 9.0 are being released together so if we're targeting one we should probably do both.

jonathan-buttner · 2025-02-06T18:29:44Z

...a/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalService.java

        for (var model : models) {
            assert model instanceof ElasticsearchInternalModel;

            if (model instanceof ElasticsearchInternalModel esModel) {
-                modelsByDeploymentIds.put(esModel.mlNodeDeploymentId(), esModel);
+                if (modelsByDeploymentIds.containsKey(esModel.mlNodeDeploymentId()) == false) {


nit: I think the if-else can be distilled to something like this:

modelsByDeploymentIds.merge( esModel.mlNodeDeploymentId(), new ArrayList<String>(List.of(esModel)), (a, b) -> { a.addAll(b); return a; });

Nice, I like this method much better! I'll go ahead and make that change.

dan-rubinstein · 2025-02-10T15:30:28Z

@elasticmachine merge upstream

elasticsearchmachine · 2025-02-10T17:51:05Z

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 121821

…ng model deployment (elastic#121821) * Fix get all inference endponts not returning multiple endpoints sharing model deployment * Update docs/changelog/121821.yaml * Clean up modelsByDeploymentId generation code --------- Co-authored-by: Elastic Machine <[email protected]>

…ng model deployment (#121821) (#122206) * Fix get all inference endponts not returning multiple endpoints sharing model deployment * Update docs/changelog/121821.yaml * Clean up modelsByDeploymentId generation code --------- Co-authored-by: Elastic Machine <[email protected]>

…ng model deployment (#121821) (#122210) * Fix get all inference endponts not returning multiple endpoints sharing model deployment * Update docs/changelog/121821.yaml * Clean up modelsByDeploymentId generation code --------- Co-authored-by: Elastic Machine <[email protected]>

…ng model deployment (#121821) (#122208) * Fix get all inference endponts not returning multiple endpoints sharing model deployment * Update docs/changelog/121821.yaml * Clean up modelsByDeploymentId generation code --------- Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Joe Gallo <[email protected]>

Fix get all inference endponts not returning multiple endpoints shari…

23175fc

…ng model deployment

dan-rubinstein added >bug :ml Machine learning Team:ML Meta label for the ML team v9.0.0 v8.19.0 v9.1.0 labels Feb 5, 2025

Update docs/changelog/121821.yaml

493aaad

Merge branch 'main' into inference-multiple-endpoints-for-deployment

2ef5ca0

dan-rubinstein marked this pull request as ready for review February 6, 2025 15:46

jonathan-buttner approved these changes Feb 6, 2025

View reviewed changes

dan-rubinstein added v8.18.0 auto-backport Automatically create backport pull requests when merged labels Feb 6, 2025

Clean up modelsByDeploymentId generation code

03f4dde

dan-rubinstein requested a review from jonathan-buttner February 6, 2025 20:25

jonathan-buttner added v9.0.1 v8.18.1 labels Feb 7, 2025

jonathan-buttner approved these changes Feb 7, 2025

View reviewed changes

Merge branch 'main' into inference-multiple-endpoints-for-deployment

1103d6f

dan-rubinstein merged commit 3810864 into elastic:main Feb 10, 2025
17 checks passed

elasticsearchmachine added the backport pending label Feb 10, 2025

dan-rubinstein mentioned this pull request Feb 10, 2025

[9.0] Fix get all inference endponts not returning multiple endpoints sharing model deployment (#121821) #122206

Merged

dan-rubinstein mentioned this pull request Feb 10, 2025

[8.x] Fix get all inference endponts not returning multiple endpoints sharing model deployment (#121821) #122208

Merged

dan-rubinstein mentioned this pull request Feb 10, 2025

[8.18] Fix get all inference endponts not returning multiple endpoints sharing model deployment (#121821) #122210

Merged

dan-rubinstein deleted the inference-multiple-endpoints-for-deployment branch February 11, 2025 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix get all inference endponts not returning multiple endpoints sharing model deployment #121821

Fix get all inference endponts not returning multiple endpoints sharing model deployment #121821

Uh oh!

dan-rubinstein commented Feb 5, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Feb 5, 2025

Uh oh!

elasticsearchmachine commented Feb 6, 2025

Uh oh!

jonathan-buttner commented Feb 6, 2025

Uh oh!

jonathan-buttner Feb 6, 2025

Uh oh!

dan-rubinstein Feb 6, 2025

Uh oh!

dan-rubinstein commented Feb 10, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 10, 2025

Uh oh!

Uh oh!

Fix get all inference endponts not returning multiple endpoints sharing model deployment #121821

Fix get all inference endponts not returning multiple endpoints sharing model deployment #121821

Uh oh!

Conversation

dan-rubinstein commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

elasticsearchmachine commented Feb 5, 2025

Uh oh!

elasticsearchmachine commented Feb 6, 2025

Uh oh!

jonathan-buttner commented Feb 6, 2025

Uh oh!

jonathan-buttner Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

dan-rubinstein Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

dan-rubinstein commented Feb 10, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 10, 2025

💔 Backport failed

Uh oh!

Uh oh!

dan-rubinstein commented Feb 5, 2025 •

edited

Loading