Skip to content

ESQL: Fix sneaky bug in single value query #127146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 22, 2025

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Apr 22, 2025

Fixes a sneaky bug in single value query that happens when run against a keyword field that:

  • Is defined on every field
  • Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index with two documents:

{"a": "foo"}
{"a": ["foo", "bar"]}

I don't think this is particularly likely in production, but it's quite likely in tests. Which is where I hit this - in the serverless tests we index an index with four documents into three shards and two of the documents look just like this. So about 1/3 or the time we triggered this bug.

Mechanically this is triggered by the SingleValueMatchQuery incorrectly rewriting itself to MatchAll in the scenario above. This fixes that.

Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
@nik9000 nik9000 requested review from dnhatn and iverase April 22, 2025 11:11
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000 nik9000 added auto-backport Automatically create backport pull requests when merged v9.0.1 and removed v9.0.0 labels Apr 22, 2025
Comment on lines 218 to 229
if (v instanceof Double n) {
fields.add(new DoubleField("foo", n, Field.Store.NO));
} else if (v instanceof Float n) {
fields.add(new DoubleField("foo", n, Field.Store.NO));
} else if (v instanceof Number n) {
long l = n.longValue();
fields.add(new LongField("foo", l, Field.Store.NO));
} else if (v instanceof String s) {
fields.add(new KeywordField("foo", v.toString(), Field.Store.NO));
} else {
throw new UnsupportedOperationException();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: could be simplified with switch like in #120875

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Nik!

@nik9000 nik9000 merged commit c2fdc06 into elastic:main Apr 22, 2025
17 checks passed
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 22, 2025
Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.18
8.x
9.0

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 22, 2025
Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 22, 2025
Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
elasticsearchmachine pushed a commit that referenced this pull request Apr 22, 2025
Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
elasticsearchmachine pushed a commit that referenced this pull request Apr 22, 2025
* ESQL: Fix sneaky bug in single value query (#127146)

Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.

* Revert "Switch"

This reverts commit cc7805d.
elasticsearchmachine pushed a commit that referenced this pull request Apr 22, 2025
* ESQL: Fix sneaky bug in single value query (#127146)

Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.

* Revert "Switch"

This reverts commit cc7805d.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.1 v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants