Skip to content

ESQL: Enable physical plan verification #118114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Dec 5, 2024

This enables the physical plan verification.
For it, a couple of changes needed to be applied/corrected:

  • AggregateMapper creates attributes with unique names;
  • AggregateExec's verification needs not consider ordinal attribute(s);
  • LookupJoinExec needs to merge attributes of same name at output, "winning" the right child;
  • ExchangeExec does no input referencing, since it only outputs all synthetic attributes, "sourced" from remote exchanges;
  • FieldExtractExec doesn't reference the attributes it "produces".

Fixes #105436.

@bpintea bpintea added >enhancement auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Dec 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

@bpintea bpintea marked this pull request as ready for review December 6, 2024 15:16
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 6, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@alex-spies alex-spies self-requested a review December 6, 2024 16:23
@alex-spies
Copy link
Contributor

I think this is the remainder of #105436. Finally! Thanks @bpintea .

@bpintea
Copy link
Contributor Author

bpintea commented Dec 6, 2024

I think this is the remainder of #105436.

Yes, will "fixes" it here, thanks. (I actually started with verifying plans from fragments after being replanned. But the spacial aggregations makes it tricky, so stripped that out for now and what remains could address this issue, true.)
Some more tests will follow, but wanted to check for any wrong assumptions in the corrections above first.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - thanks for looking into this and cleaning this up, especially the AggregateMapper.
LGTM

Comment on lines +55 to +57
if (p instanceof AggregateExec agg) {
var ordinalAttributes = agg.ordinalAttributes();
missing.removeAll(Expressions.references(ordinalAttributes));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better.

List<Attribute> orginalAttributs = new ArrayList<>(groupings.size());
if (groupings().size() == 1) {
// CATEGORIZE requires the standard hash aggregator as well.
if (groupings.get(0).anyMatch(e -> e instanceof Categorize) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (groupings.get(0).anyMatch(e -> e instanceof Categorize) == false) {
if (groupings.get(0).noneMatch(e -> e instanceof Categorize)) {

I think this could also be merged with above if to minimize nesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupings is a list of Expression (not a stream), so won't have noneMatch(). But I think refuting a predicate is quite common, adding a noneMatch() would actually make sense. Will leave this as is for now, but will follow up.

I think this could also be merged with above if to minimize nesting.

I've only grafted this patch, so I can only guess that the intention might have been to allow for more exceptions (or easier comments). But maybe this can be now merged and split again when needed.

if (groupings.get(0).anyMatch(e -> e instanceof Categorize) == false) {
var leaves = new LinkedList<>();
// TODO: this seems out of place
aggregates.stream().filter(a -> groupings.contains(a) == false).forEach(a -> leaves.addAll(a.collectLeaves()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use collect instead of addAll in forEach.
Since it is used for .contains check below, should it collect to set instead of the list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, turned it into a Set.

Lets use collect instead of addAll in forEach.

Yeh, that'd be an alternative, but with stream chaining and the line breaking, it wouldn't be more succint than current format, so I've left as is.

@bpintea bpintea added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 9, 2024
@elasticsearchmachine elasticsearchmachine merged commit 2be8b67 into elastic:main Dec 9, 2024
16 checks passed
@bpintea bpintea deleted the enh/enable_physical_verification branch December 9, 2024 22:04
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

bpintea added a commit to bpintea/elasticsearch that referenced this pull request Dec 9, 2024
This enables the physical plan verification. For it, a couple of changes
needed to be applied/corrected: * AggregateMapper creates attributes
with unique names; * AggregateExec's verification needs not consider
ordinal attribute(s); * LookupJoinExec needs to merge attributes of same
name at output, "winning" the right child; * ExchangeExec does no input
referencing, since it only outputs all synthetic attributes, "sourced"
from remote exchanges; * FieldExtractExec doesn't reference the
attributes it "produces".
Comment on lines +96 to +98
var addedFieldsNames = addedFields.stream().map(Attribute::name).toList();
lazyOutput.removeIf(a -> addedFieldsNames.contains(a.name()));
lazyOutput.addAll(addedFields);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is very interesting. Thanks for catching this.

I'm wondering if enforcing unique names for PhysicalPlan outputs is actually the right way to go. It is what we assumed so far, so I don't wanna depart from here, but there's also a reason to be more lenient:

In PhysicalPlan land, this output method was technically correct, because it represented exactly what the physical operators will do: they're gonna append the blocks for the added fields to the pages that are processed. The blocks corresponding to shadowed attributes/channels will not be stripped from incoming pages.

This means that the output layout technically has duplicate attribute names. This change just hides them and makes them inaccessible during the physical planning stage.

FWIW, EvalExec has the exact same situation. Eval performs shadowing, but EvalOperator can only ever append blocks, so the output pages actually still contain the blocks corresponding to shadowed attributes.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late to the party, but had to take a look anyway because this will interact with LOOKUP JOIN a little.

Just wanted to say thank you again and LGTM!

elasticsearchmachine pushed a commit that referenced this pull request Dec 12, 2024
)

* ESQL: Enable physical plan verification (#118114)

This enables the physical plan verification. For it, a couple of changes
needed to be applied/corrected: * AggregateMapper creates attributes
with unique names; * AggregateExec's verification needs not consider
ordinal attribute(s); * LookupJoinExec needs to merge attributes of same
name at output, "winning" the right child; * ExchangeExec does no input
referencing, since it only outputs all synthetic attributes, "sourced"
from remote exchanges; * FieldExtractExec doesn't reference the
attributes it "produces".

* ESQL: Disable remote enrich verification (#118534)

This disables verifying the plans generated for remote ENRICHing.
It also re-enables corresponding failing test.

Related: #118531
Fixes #118307.

(cherry picked from commit e7a4436)
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
…#118534) (elastic#118302)

* ESQL: Enable physical plan verification (elastic#118114)

This enables the physical plan verification. For it, a couple of changes
needed to be applied/corrected: * AggregateMapper creates attributes
with unique names; * AggregateExec's verification needs not consider
ordinal attribute(s); * LookupJoinExec needs to merge attributes of same
name at output, "winning" the right child; * ExchangeExec does no input
referencing, since it only outputs all synthetic attributes, "sourced"
from remote exchanges; * FieldExtractExec doesn't reference the
attributes it "produces".

* ESQL: Disable remote enrich verification (elastic#118534)

This disables verifying the plans generated for remote ENRICHing.
It also re-enables corresponding failing test.

Related: elastic#118531
Fixes elastic#118307.

(cherry picked from commit e7a4436)
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
…#118534) (elastic#118302)

* ESQL: Enable physical plan verification (elastic#118114)

This enables the physical plan verification. For it, a couple of changes
needed to be applied/corrected: * AggregateMapper creates attributes
with unique names; * AggregateExec's verification needs not consider
ordinal attribute(s); * LookupJoinExec needs to merge attributes of same
name at output, "winning" the right child; * ExchangeExec does no input
referencing, since it only outputs all synthetic attributes, "sourced"
from remote exchanges; * FieldExtractExec doesn't reference the
attributes it "produces".

* ESQL: Disable remote enrich verification (elastic#118534)

This disables verifying the plans generated for remote ENRICHing.
It also re-enables corresponding failing test.

Related: elastic#118531
Fixes elastic#118307.

(cherry picked from commit e7a4436)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Re-enable dependency checks
5 participants