-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: Fix alias removal in regex extraction with JOIN
#127687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Fix alias removal in regex extraction with JOIN
#127687
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
buildkite test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution is not optimal. Maybe it fixes the problem, but the canRemoveAliases
references in multiple places makes the code hard to follow and reason about.
Instead, look at the code that is tied to lookup join
and enrich
where, after seeing this issue with dissect
, it is clear that this code - p.forEachExpressionDown(Alias.class, alias -> {
shouldn't only apply to Alias
(which comes from eval
commands), but also to whatever comes out from dissect
(meaning, any user-defined columns).
Rather than using canRemoveAliases[0]
in RegexExtract
check, look into adjusting the code I mentioned above.
Thank you for your review and detailed guidance @astefan! Your keen eye for code goes far beyond mine, I still have a long journey ahead in learning and improving. I've updated the branch, please feel free to review it again at your convenience. If there are any further adjustments needed, I'm more than happy to make them. |
buildkite test this |
buildkite test this |
AttributeSet planRefs = p.references(); | ||
Set<String> fieldNames = planRefs.names(); | ||
p.forEachExpressionDown(Alias.class, alias -> { | ||
p.forEachExpressionDown(NamedExpression.class, expression -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a naming preference towards NamedExpression
- ne
that's used in other places in code.
p.forEachExpressionDown(NamedExpression.class, expression -> { | |
p.forEachExpressionDown(NamedExpression.class, ne-> { |
@@ -1650,3 +1650,69 @@ event_duration:long | |||
2764889 | |||
3450233 | |||
; | |||
|
|||
|
|||
joinMaskingRegex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a big fan of having such test queries here. At least, add a comment with the link to the original bug report.
joinMaskingDissect | ||
required_capability: join_lookup_v12 | ||
required_capability: fix_join_masking_regex_extract | ||
from sample_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this test and the one below in IndexResolverFieldNamesTests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you missed this request here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry! I misunderstood what you meant. I was actually thinking about adding comments of links to the original issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in d56383b
return; | ||
} | ||
referencesBuilder.removeIf(attr -> matchByName(attr, alias.name(), keepCommandRefsBuilder.contains(attr))); | ||
referencesBuilder.removeIf( | ||
attr -> matchByName(attr, expression.name(), (expression instanceof Alias) && keepCommandRefsBuilder.contains(attr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this check here? (expression instanceof Alias)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't give it much thought since skipIfPattern
was set to false
for RegexExtract
. After considering it briefly, I think removing this check wouldn't hurt too much.
buildkite test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kanoshiou, please address all reviews.
buildkite test this |
…emoval # Conflicts: # x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java # x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/** | ||
* During resolution (pre-analysis) we have to consider that joins can override regex extracted values | ||
* see <a href="https://github.com/elastic/elasticsearch/issues/127467"> ES|QL: pruning of JOINs leads to missing fields #127467 </a> | ||
*/ | ||
FIX_JOIN_MASKING_REGEX_EXTRACT, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add this capability at the end of the list of capabilities. It will be slightly easier for me to manually backport the PR, if the automatic backport fails.
When you get the chance, please integrate new changes from |
Thanks @astefan. The branch has been updated. |
buildkite test this |
1 similar comment
buildkite test this |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Thank you @astefan! |
* Disallow removal of regex extracted fields --------- Co-authored-by: Andrei Stefan <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]> (cherry picked from commit 557f1f1)
* Disallow removal of regex extracted fields --------- Co-authored-by: Andrei Stefan <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]> (cherry picked from commit 557f1f1)
…28202) * Disallow removal of regex extracted fields --------- (cherry picked from commit 557f1f1) Co-authored-by: kanoshiou <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]>
…28204) * ESQL: Fix alias removal in regex extraction with `JOIN` (#127687) * Disallow removal of regex extracted fields --------- Co-authored-by: Andrei Stefan <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]> (cherry picked from commit 557f1f1) * Checkstyle --------- Co-authored-by: kanoshiou <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]>
Because we aim to minimize the number of attributes required from
field_caps
at pre-analysis time, a removal check helps eliminate fields defined by users (such aseval
). However, this removal logic does not apply toReferenceAttribute
produced by regex extraction commandsgrok
anddissect
, even though these fields are also user-defined.Closes #127467