Skip to content

Fix a bug in significant_terms #127975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 9, 2025
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 9, 2025

Fix a bug in the significant_terms agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a range agg containing a significant_terms AND you only collect the first few ranges. range isn't particularly popular, but date_histogram is super popular and it rewrites into a range pretty commonly - so that's likely what's really hitting this - a date_histogram followed by a significant_text where the matches are all early in the date range held by the shard.

Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
@nik9000 nik9000 requested a review from not-napoleon May 9, 2025 15:17
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for fixing this.

@nik9000 nik9000 added auto-backport Automatically create backport pull requests when merged v8.16.7 labels May 9, 2025
@nik9000 nik9000 merged commit da553b1 into elastic:main May 9, 2025
17 checks passed
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19
8.16 Commit could not be cherrypicked due to conflicts
8.18 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127975

@nik9000
Copy link
Member Author

nik9000 commented May 9, 2025

backport to 9.0: #127992
backport to 8.18: #127993

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
elasticsearchmachine pushed a commit that referenced this pull request May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
@nik9000 nik9000 removed the v8.17.7 label May 9, 2025
@nik9000
Copy link
Member Author

nik9000 commented May 9, 2025

8.17 isn't backporting well. We'll keep this in 8.18+.

elasticsearchmachine pushed a commit that referenced this pull request May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
elasticsearchmachine pushed a commit that referenced this pull request May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request May 12, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
julio-santana pushed a commit that referenced this pull request May 12, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
prdoyle pushed a commit that referenced this pull request May 12, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.

Co-authored-by: Nik Everett <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations auto-backport Automatically create backport pull requests when merged backport pending >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.2 v8.19.0 v9.0.2 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants