Skip to content

ESQL: Fix alias removal in regex extraction with JOIN #127687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kanoshiou
Copy link
Contributor

@kanoshiou kanoshiou commented May 4, 2025

Because we aim to minimize the number of attributes required from field_caps at pre-analysis time, a removal check helps eliminate fields defined by users (such as eval). However, this removal logic does not apply to ReferenceAttribute produced by regex extraction commands grok and dissect, even though these fields are also user-defined.

Closes #127467

@elasticsearchmachine elasticsearchmachine added v9.1.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels May 4, 2025
@astefan astefan self-requested a review May 5, 2025 07:44
@astefan astefan self-assigned this May 5, 2025
@astefan astefan added >bug :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels May 5, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@astefan
Copy link
Contributor

astefan commented May 5, 2025

buildkite test this

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is not optimal. Maybe it fixes the problem, but the canRemoveAliases references in multiple places makes the code hard to follow and reason about.

Instead, look at the code that is tied to lookup join and enrich where, after seeing this issue with dissect, it is clear that this code - p.forEachExpressionDown(Alias.class, alias -> { shouldn't only apply to Alias (which comes from eval commands), but also to whatever comes out from dissect (meaning, any user-defined columns).

Rather than using canRemoveAliases[0] in RegexExtract check, look into adjusting the code I mentioned above.

@astefan astefan requested a review from luigidellaquila May 5, 2025 17:41
@kanoshiou
Copy link
Contributor Author

Thank you for your review and detailed guidance @astefan! Your keen eye for code goes far beyond mine, I still have a long journey ahead in learning and improving.

I've updated the branch, please feel free to review it again at your convenience. If there are any further adjustments needed, I'm more than happy to make them.

@astefan
Copy link
Contributor

astefan commented May 9, 2025

buildkite test this

@astefan astefan self-requested a review May 9, 2025 09:28
@astefan
Copy link
Contributor

astefan commented May 9, 2025

buildkite test this

AttributeSet planRefs = p.references();
Set<String> fieldNames = planRefs.names();
p.forEachExpressionDown(Alias.class, alias -> {
p.forEachExpressionDown(NamedExpression.class, expression -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a naming preference towards NamedExpression - ne that's used in other places in code.

Suggested change
p.forEachExpressionDown(NamedExpression.class, expression -> {
p.forEachExpressionDown(NamedExpression.class, ne-> {

@@ -1650,3 +1650,69 @@ event_duration:long
2764889
3450233
;


joinMaskingRegex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of having such test queries here. At least, add a comment with the link to the original bug report.

joinMaskingDissect
required_capability: join_lookup_v12
required_capability: fix_join_masking_regex_extract
from sample_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this test and the one below in IndexResolverFieldNamesTests as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed this request here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry! I misunderstood what you meant. I was actually thinking about adding comments of links to the original issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in d56383b

return;
}
referencesBuilder.removeIf(attr -> matchByName(attr, alias.name(), keepCommandRefsBuilder.contains(attr)));
referencesBuilder.removeIf(
attr -> matchByName(attr, expression.name(), (expression instanceof Alias) && keepCommandRefsBuilder.contains(attr))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this check here? (expression instanceof Alias)

Copy link
Contributor Author

@kanoshiou kanoshiou May 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't give it much thought since skipIfPattern was set to false for RegexExtract. After considering it briefly, I think removing this check wouldn't hurt too much.

@astefan
Copy link
Contributor

astefan commented May 12, 2025

buildkite test this

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kanoshiou, please address all reviews.

@astefan
Copy link
Contributor

astefan commented May 12, 2025

buildkite test this

kanoshiou added 2 commits May 13, 2025 09:15
…emoval

# Conflicts:
#	x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
@astefan astefan self-requested a review May 20, 2025 10:04
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 1045 to 1050
/**
* During resolution (pre-analysis) we have to consider that joins can override regex extracted values
* see <a href="https://github.com/elastic/elasticsearch/issues/127467"> ES|QL: pruning of JOINs leads to missing fields #127467 </a>
*/
FIX_JOIN_MASKING_REGEX_EXTRACT,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add this capability at the end of the list of capabilities. It will be slightly easier for me to manually backport the PR, if the automatic backport fails.

@astefan
Copy link
Contributor

astefan commented May 20, 2025

When you get the chance, please integrate new changes from main and I'll start a new CI round of tests. Thank you.

@astefan astefan added v8.19.0 v9.0.2 auto-backport Automatically create backport pull requests when merged labels May 20, 2025
@kanoshiou
Copy link
Contributor Author

Thanks @astefan. The branch has been updated.

@astefan
Copy link
Contributor

astefan commented May 20, 2025

buildkite test this

1 similar comment
@astefan
Copy link
Contributor

astefan commented May 20, 2025

buildkite test this

@astefan astefan merged commit 557f1f1 into elastic:main May 20, 2025
19 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127687

@kanoshiou
Copy link
Contributor Author

Thank you @astefan!

astefan added a commit to astefan/elasticsearch that referenced this pull request May 20, 2025
* Disallow removal of regex extracted fields
---------

Co-authored-by: Andrei Stefan <[email protected]>
Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 557f1f1)
astefan added a commit to astefan/elasticsearch that referenced this pull request May 20, 2025
* Disallow removal of regex extracted fields
---------

Co-authored-by: Andrei Stefan <[email protected]>
Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 557f1f1)
elasticsearchmachine added a commit that referenced this pull request May 20, 2025
…28202)

* Disallow removal of regex extracted fields
---------



(cherry picked from commit 557f1f1)

Co-authored-by: kanoshiou <[email protected]>
Co-authored-by: elasticsearchmachine <[email protected]>
elasticsearchmachine added a commit that referenced this pull request May 21, 2025
…28204)

* ESQL: Fix alias removal in regex extraction with `JOIN` (#127687)

* Disallow removal of regex extracted fields
---------

Co-authored-by: Andrei Stefan <[email protected]>
Co-authored-by: elasticsearchmachine <[email protected]>
(cherry picked from commit 557f1f1)

* Checkstyle

---------

Co-authored-by: kanoshiou <[email protected]>
Co-authored-by: elasticsearchmachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.0.2 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES|QL: pruning of JOINs leads to missing fields
3 participants