-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add Highlighter for Semantic Text Fields #118064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR introduces a new highlighter, `semantic`, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query. In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
Documentation preview: |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Hi @jimczi, I've created a changelog YAML for you. |
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall! Could we also add highlighting YAML tests?
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall.
.../src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once CI is green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
💚 Backport successful
|
This PR introduces a new highlighter, `semantic`, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query. In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
* Add Highlighter for Semantic Text Fields (#118064) This PR introduces a new highlighter, `semantic`, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query. In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments. * Update x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
… of inner_hits (#204962) ## Summary Switch to use elastic/elasticsearch#118064 when retrieving Knowledge base Index entry docs Followed testing instructions from #198020 Results: <img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28" src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc" /> <img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38" src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202" /> <img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43" src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd" /> <img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47" src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d" /> <img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50" src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae" />
… of inner_hits (elastic#204962) ## Summary Switch to use elastic/elasticsearch#118064 when retrieving Knowledge base Index entry docs Followed testing instructions from elastic#198020 Results: <img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28" src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc" /> <img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38" src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202" /> <img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43" src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd" /> <img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47" src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d" /> <img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50" src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae" /> (cherry picked from commit 5539000)
…nstead of inner_hits (#204962) (#206509) # Backport This will backport the following commits from `main` to `8.x`: - [[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)](#204962) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Patryk Kopyciński","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-01-10T15:51:38Z","message":"[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use elastic/elasticsearch#118064 when\r\nretrieving Knowledge base Index entry docs\r\n\r\nFollowed testing instructions from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Feature:Security Assistant","Team:Security Generative AI","backport:version","v8.18.0"],"number":204962,"url":"https://github.com/elastic/kibana/pull/204962","mergeCommit":{"message":"[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use elastic/elasticsearch#118064 when\r\nretrieving Knowledge base Index entry docs\r\n\r\nFollowed testing instructions from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/204962","number":204962,"mergeCommit":{"message":"[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use elastic/elasticsearch#118064 when\r\nretrieving Knowledge base Index entry docs\r\n\r\nFollowed testing instructions from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4"}},{"branch":"8.x","label":"v8.18.0","labelRegex":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
… of inner_hits (elastic#204962) ## Summary Switch to use elastic/elasticsearch#118064 when retrieving Knowledge base Index entry docs Followed testing instructions from elastic#198020 Results: <img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28" src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc" /> <img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38" src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202" /> <img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43" src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd" /> <img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47" src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d" /> <img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50" src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae" />
Is this semantic highlight supoort the dense vectors? |
This PR introduces a new highlighter,
semantic
, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query.In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.