-
Notifications
You must be signed in to change notification settings - Fork 99
Add POST _unified for the inference API #3313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
"stability": "stable", | ||
"visibility": "public", | ||
"headers": { | ||
"accept": ["text/event-stream"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what we have for the streaming api here: https://github.com/elastic/elasticsearch-specification/blob/main/specification/_json_spec/inference.stream_inference.json#L10
Not sure if this is correct if it's indicating that the post request accepts an event stream or that the response returns one 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both HTTP headers are used in client requests:
- Accept tells Elasticsearch that returning event streams is OK
- Content-Type is about the request's body, which is indeed JSON here.
In other words, this is correct, thank you!
|
||
/** | ||
* Perform inference on the service using the Unified Schema | ||
* @rest_spec_name inference.unified_inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unified_inference
seemed to follow stream_inference
. I'm open to other ideas though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinions, this seems OK.
/** | ||
* Perform inference on the service using the Unified Schema | ||
* @rest_spec_name inference.unified_inference | ||
* @availability stack since=8.18.0 stability=stable visibility=public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidkyle Double check that this is what we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I answered your questions, but did not look at the contents yet. Do we have YAML tests in Elasticsearch for this feature? It would help to validate the requests.
"stability": "stable", | ||
"visibility": "public", | ||
"headers": { | ||
"accept": ["text/event-stream"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both HTTP headers are used in client requests:
- Accept tells Elasticsearch that returning event streams is OK
- Content-Type is about the request's body, which is indeed JSON here.
In other words, this is correct, thank you!
|
||
/** | ||
* Perform inference on the service using the Unified Schema | ||
* @rest_spec_name inference.unified_inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinions, this seems OK.
/** | ||
* Perform inference on the service using the Unified Schema | ||
* @rest_spec_name inference.unified_inference | ||
* @availability stack since=8.18.0 stability=stable visibility=public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -0,0 +1,214 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually try to have only the request class in the corresponding Request file, and all other types we put in the types
folder above (this is just to make it easier to maintain)
nevermind that, could we just move the main Request type to the top of the file?
@jonathan-buttner I have a question about the response format: is there a way to get it in json format, or is it just text/stream? |
Thanks for the review! Currently it's only |
Thanks for the review! Not yet, let me see if I can add some. |
the thing is: if the return type is just text then the response body should be just |
Oh sorry I might have misunderstood. Here's what Postman parses out: Where each one of those messages is valid json:
Except for the Do we still want the response to be only text? |
Postman makes it look mainly like JSON, but those are server-sent events. Is this unified API going to be used for multiple providers? OpenAI, Anthropic Claude, and Google Gemini all use server-sent events but then provide really different structures: https://til.simonwillison.net/llms/streaming-llm-apis. If yes, then clients would need to know what provider this is (in the content-type header?). There would not be a need to map the options in the spec; it's OK to treat it as an opaque string or buffer. Also, I'm afraid that more complete discussions around this will have to wait for next year. |
Yeah that makes sense
Sorry I should have provided more details when opening the PR. The hope with this API is to provide a consistent request and response schema that will be the same across multiple providers. Internally we'll handle transforming the request and response to match the individual providers and our schema. When we get a request for Anthropic we'll translate our "unified" schema into a request for Anthropic specifically and then as we get responses from Anthropic we'll translate it back into the format that fits the response schema we're proposing here. |
… into ml-unified-spec
Hey @jonathan-buttner! the team had a discussion about this and we agreed to keep the result as a simple ArrayBuffer, and leave it to the users to parse it using the tools they prefer. This means adding a new type in the Binary.ts file:
and using that as the whole response, so UnifiedResponse.ts becomes:
the specific classes modeled only for the response can therefore be deleted. |
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thank you
* Adding the unified api * Fixing formatting * Addressing feedback and removing response --------- Co-authored-by: Laura Trotta <[email protected]> (cherry picked from commit 2a57733)
* Adding the unified api * Fixing formatting * Addressing feedback and removing response --------- Co-authored-by: Laura Trotta <[email protected]> (cherry picked from commit 2a57733) Co-authored-by: Jonathan Buttner <[email protected]>
This PR adds the
_unified
route for the Inference API. This coincides with the elasticsearch PR: elastic/elasticsearch#117589