Description
Elasticsearch Version
9.0.0
Installed Plugins
No response
Java Version
bundled
OS Version
any
Problem Description
Inference Endpoints created with the elasticsearch
service are backed by models deployed on ml nodes. The Trained Model APIs can also be used to interact with that same backing model deployment. It is possible to stop the deployment with the Trained Model APIs but doing so breaks the inference endpoint as its model deployment has been removed.
If the deployment is used by an ingest processor the stop action gives a warning and requires the force
parameter. At the very least the code should check for usage by an inference endpoint and require the force
parameter. Or it should not be possible to stop a deployment managed by the inference API with the trained models APIs.
Steps to Reproduce
First create an inference endpoint using the elasticsearch
service to deploy the model on an ml node.
PUT _inference/sparse_embedding/my-elser-inference-endpoint
{
"service": "elasticsearch",
"service_settings": {
"num_allocations": 1,
"num_threads": 2,
"model_id": ".elser_model_2_linux-x86_64"
},
"chunking_settings": {
"strategy": "sentence",
"max_chunk_size": 250,
"sentence_overlap": 1
}
}
Call stop with the trained model API. The deployment stops and now the inference endpoint cannot be used as the backing model deployment has gone.
POST _ml/trained_models/my-elser-inference-endpoint/deployment/_stop
Logs (if relevant)
No response