Skip to content

Add Highlighter for Semantic Text Fields #118064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Dec 6, 2024
Merged

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Dec 5, 2024

This PR introduces a new highlighter, semantic, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query.

In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.

This PR introduces a new highlighter, `semantic`, tailored for semantic text fields.
It extracts the most relevant fragments by scoring nested chunks using the original semantic query.

In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
Copy link
Contributor

github-actions bot commented Dec 5, 2024

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Dec 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! Could we also add highlighting YAML tests?

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall.

@jimczi
Copy link
Contributor Author

jimczi commented Dec 6, 2024

@Mikep86 @kderusso I added the yml tests and addressed your other comments.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI is green

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jimczi jimczi added the auto-backport Automatically create backport pull requests when merged label Dec 6, 2024
@jimczi jimczi merged commit c580024 into elastic:main Dec 6, 2024
16 checks passed
@jimczi jimczi deleted the semantic_highlighter branch December 6, 2024 18:42
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Dec 6, 2024
This PR introduces a new highlighter, `semantic`, tailored for semantic text fields.
It extracts the most relevant fragments by scoring nested chunks using the original semantic query.

In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
elasticsearchmachine pushed a commit that referenced this pull request Dec 6, 2024
* Add Highlighter for Semantic Text Fields (#118064)

This PR introduces a new highlighter, `semantic`, tailored for semantic text fields.
It extracts the most relevant fragments by scoring nested chunks using the original semantic query.

In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.

* Update x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
patrykkopycinski added a commit to elastic/kibana that referenced this pull request Jan 10, 2025
… of inner_hits (#204962)

## Summary

Switch to use elastic/elasticsearch#118064 when
retrieving Knowledge base Index entry docs

Followed testing instructions from
#198020

Results:
<img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28"
src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc"
/>

<img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38"
src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202"
/>

<img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43"
src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd"
/>

<img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47"
src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d"
/>

<img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50"
src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae"
/>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jan 10, 2025
… of inner_hits (elastic#204962)

## Summary

Switch to use elastic/elasticsearch#118064 when
retrieving Knowledge base Index entry docs

Followed testing instructions from
elastic#198020

Results:
<img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28"
src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc"
/>

<img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38"
src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202"
/>

<img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43"
src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd"
/>

<img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47"
src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d"
/>

<img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50"
src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae"
/>

(cherry picked from commit 5539000)
patrykkopycinski added a commit to elastic/kibana that referenced this pull request Jan 14, 2025
…nstead of inner_hits (#204962) (#206509)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Security Assistant] Migrate semantic_text to use highlighter instead
of inner_hits (#204962)](#204962)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Patryk
Kopyciński","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-01-10T15:51:38Z","message":"[Security
Assistant] Migrate semantic_text to use highlighter instead of
inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use
elastic/elasticsearch#118064 when\r\nretrieving
Knowledge base Index entry docs\r\n\r\nFollowed testing instructions
from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img
width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img
width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img
width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img
width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img
width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Feature:Security
Assistant","Team:Security Generative
AI","backport:version","v8.18.0"],"number":204962,"url":"https://github.com/elastic/kibana/pull/204962","mergeCommit":{"message":"[Security
Assistant] Migrate semantic_text to use highlighter instead of
inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use
elastic/elasticsearch#118064 when\r\nretrieving
Knowledge base Index entry docs\r\n\r\nFollowed testing instructions
from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img
width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img
width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img
width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img
width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img
width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/204962","number":204962,"mergeCommit":{"message":"[Security
Assistant] Migrate semantic_text to use highlighter instead of
inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use
elastic/elasticsearch#118064 when\r\nretrieving
Knowledge base Index entry docs\r\n\r\nFollowed testing instructions
from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img
width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img
width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img
width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img
width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img
width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32
50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4"}},{"branch":"8.x","label":"v8.18.0","labelRegex":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->
viduni94 pushed a commit to viduni94/kibana that referenced this pull request Jan 23, 2025
… of inner_hits (elastic#204962)

## Summary

Switch to use elastic/elasticsearch#118064 when
retrieving Knowledge base Index entry docs

Followed testing instructions from
elastic#198020

Results:
<img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28"
src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc"
/>

<img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38"
src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202"
/>

<img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43"
src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd"
/>

<img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47"
src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d"
/>

<img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50"
src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae"
/>
@Imran-ml
Copy link

Imran-ml commented Jun 4, 2025

Is this semantic highlight supoort the dense vectors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >feature :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants