Lists: | pgsql-hackers |
---|
From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Redundant Result node |
Date: | 2024-08-22 07:34:05 |
Message-ID: | CAMbWs48TosSvmnz88663_2yg3hfeOFss-J2PtnENDH6J_rLnRQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
I ran into a query plan where the Result node seems redundant to me:
create table t (a int, b int, c int);
insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
create index on t (a, b);
analyze t;
set enable_hashagg to off;
set enable_seqscan to off;
explain (verbose, costs off)
select distinct b, a from t order by a, b;
QUERY PLAN
---------------------------------------------------------
Result
Output: b, a
-> Unique
Output: a, b
-> Index Only Scan using t_a_b_idx on public.t
Output: a, b
(6 rows)
What I expect is that both the Scan node and the Unique node output
'b, a', and we do not need an additional projection step, something
like:
explain (verbose, costs off)
select distinct b, a from t order by a, b;
QUERY PLAN
---------------------------------------------------
Unique
Output: b, a
-> Index Only Scan using t_a_b_idx on public.t
Output: b, a
(4 rows)
I looked into this a little bit and found that in function
create_ordered_paths, we decide whether a projection step is needed
based on a simple pointer comparison between sorted_path->pathtarget
and final_target.
/* Add projection step if needed */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);
This does not seem right to me, as PathTargets are not canonical, so
we cannot guarantee that two identical PathTargets will have the same
pointer. Actually, for the query above, the two PathTargets are
identical but have different pointers.
I wonder if we need to invent a function to compare two PathTargets.
Alternatively, in this case, would it suffice to simply compare
PathTarget.exprs?
Thanks
Richard
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-22 11:15:00 |
Message-ID: | CAApHDvo14gwvtSdhYzn=6CdUSaVpnMPwUct4Qx4j+fO+5TixoA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, 22 Aug 2024 at 19:34, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> /* Add projection step if needed */
> if (sorted_path->pathtarget != target)
> sorted_path = apply_projection_to_path(root, ordered_rel,
> sorted_path, target);
>
> This does not seem right to me, as PathTargets are not canonical, so
> we cannot guarantee that two identical PathTargets will have the same
> pointer. Actually, for the query above, the two PathTargets are
> identical but have different pointers.
>
> I wonder if we need to invent a function to compare two PathTargets.
> Alternatively, in this case, would it suffice to simply compare
> PathTarget.exprs?
I think tlist.c would be a good home for such a function. If you go
with the function route, then it's easier to add optimisations such as
checking if the pointers are equal before going to the trouble of
checking if the exprs match.
David
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-22 11:32:43 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 22.08.24 09:34, Richard Guo wrote:
> I looked into this a little bit and found that in function
> create_ordered_paths, we decide whether a projection step is needed
> based on a simple pointer comparison between sorted_path->pathtarget
> and final_target.
>
> /* Add projection step if needed */
> if (sorted_path->pathtarget != target)
> sorted_path = apply_projection_to_path(root, ordered_rel,
> sorted_path, target);
>
> This does not seem right to me, as PathTargets are not canonical, so
> we cannot guarantee that two identical PathTargets will have the same
> pointer. Actually, for the query above, the two PathTargets are
> identical but have different pointers.
>
> I wonder if we need to invent a function to compare two PathTargets.
Wouldn't the normal node equal() work?
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-22 12:02:48 |
Message-ID: | CAApHDvqWSmLMJnVCK86LY2cWz=y03-6cP2rTScqXPu8GQG18PQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, 22 Aug 2024 at 23:33, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> > I wonder if we need to invent a function to compare two PathTargets.
>
> Wouldn't the normal node equal() work?
It might. I think has_volatile_expr might be missing a
pg_node_attr(equal_ignore).
David
From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-22 13:02:20 |
Message-ID: | CAEudQArO-eZw4FR2_RTEbhaRjLoZagTzN2t5hjeMVs0B5vkuzg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi.
Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux(at)gmail(dot)com>
escreveu:
> I ran into a query plan where the Result node seems redundant to me:
>
> create table t (a int, b int, c int);
> insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
> create index on t (a, b);
> analyze t;
>
> set enable_hashagg to off;
> set enable_seqscan to off;
>
> explain (verbose, costs off)
> select distinct b, a from t order by a, b;
> QUERY PLAN
> ---------------------------------------------------------
> Result
> Output: b, a
> -> Unique
> Output: a, b
> -> Index Only Scan using t_a_b_idx on public.t
> Output: a, b
> (6 rows)
>
> What I expect is that both the Scan node and the Unique node output
> 'b, a', and we do not need an additional projection step, something
> like:
>
> explain (verbose, costs off)
> select distinct b, a from t order by a, b;
> QUERY PLAN
> ---------------------------------------------------
> Unique
> Output: b, a
> -> Index Only Scan using t_a_b_idx on public.t
> Output: b, a
> (4 rows)
>
> I looked into this a little bit and found that in function
> create_ordered_paths, we decide whether a projection step is needed
> based on a simple pointer comparison between sorted_path->pathtarget
> and final_target.
>
> /* Add projection step if needed */
> if (sorted_path->pathtarget != target)
> sorted_path = apply_projection_to_path(root, ordered_rel,
> sorted_path, target);
>
> This does not seem right to me, as PathTargets are not canonical, so
> we cannot guarantee that two identical PathTargets will have the same
> pointer. Actually, for the query above, the two PathTargets are
> identical but have different pointers.
>
Could memcmp solve this?
With patch attached, using memcmp to compare the pointers.
select distinct b, a from t order by a, b;
QUERY PLAN
----------------------------------
Sort
Output: b, a
Sort Key: t.a, t.b
-> HashAggregate
Output: b, a
Group Key: t.a, t.b
-> Seq Scan on public.t
Output: a, b, c
(8 rows)
attached patch for consideration.
best regards,
Ranier Vilela
Attachment | Content-Type | Size |
---|---|---|
0001-avoid-redudant-result-node.patch | application/octet-stream | 974 bytes |
From: | Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Cc: | Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-22 20:17:13 |
Message-ID: | CA+FpmFdu-ocfiyO4u81k86YHG-Jf-GtC2DToGrcnYUg2EcQ8Kw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, 22 Aug 2024 at 15:02, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> Hi.
>
> Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux(at)gmail(dot)com>
> escreveu:
>
>> I ran into a query plan where the Result node seems redundant to me:
>>
>> create table t (a int, b int, c int);
>> insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
>> create index on t (a, b);
>> analyze t;
>>
>> set enable_hashagg to off;
>> set enable_seqscan to off;
>>
>> explain (verbose, costs off)
>> select distinct b, a from t order by a, b;
>> QUERY PLAN
>> ---------------------------------------------------------
>> Result
>> Output: b, a
>> -> Unique
>> Output: a, b
>> -> Index Only Scan using t_a_b_idx on public.t
>> Output: a, b
>> (6 rows)
>>
>> What I expect is that both the Scan node and the Unique node output
>> 'b, a', and we do not need an additional projection step, something
>> like:
>>
>> explain (verbose, costs off)
>> select distinct b, a from t order by a, b;
>> QUERY PLAN
>> ---------------------------------------------------
>> Unique
>> Output: b, a
>> -> Index Only Scan using t_a_b_idx on public.t
>> Output: b, a
>> (4 rows)
>>
>> I looked into this a little bit and found that in function
>> create_ordered_paths, we decide whether a projection step is needed
>> based on a simple pointer comparison between sorted_path->pathtarget
>> and final_target.
>>
>> /* Add projection step if needed */
>> if (sorted_path->pathtarget != target)
>> sorted_path = apply_projection_to_path(root, ordered_rel,
>> sorted_path, target);
>>
>> This does not seem right to me, as PathTargets are not canonical, so
>> we cannot guarantee that two identical PathTargets will have the same
>> pointer. Actually, for the query above, the two PathTargets are
>> identical but have different pointers.
>>
> Could memcmp solve this?
>
> With patch attached, using memcmp to compare the pointers.
>
> select distinct b, a from t order by a, b;
> QUERY PLAN
> ----------------------------------
> Sort
> Output: b, a
> Sort Key: t.a, t.b
> -> HashAggregate
> Output: b, a
> Group Key: t.a, t.b
> -> Seq Scan on public.t
> Output: a, b, c
> (8 rows)
>
> attached patch for consideration.
>
> best regards,
> Ranier Vilela
>
+1 for the idea of removing this redundant node.
I had a look in this patch, and I was wondering if we still need
sorted_path->pathtarget != target in the condition.
Apart from that,
- if (sorted_path->pathtarget != target)
+ if (sorted_path->pathtarget != target &&
+ memcmp(sorted_path->pathtarget, target,
sizeof(PathTarget)) != 0)
An extra space is there, please fix it.
Some regression tests should be added for this.
--
Regards,
Rafia Sabih
From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-23 02:31:14 |
Message-ID: | CAMbWs49=CXNBjRWDcTv2gyu7iTHxMLsduMB_0iEudFdwfVsebA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Aug 22, 2024 at 8:03 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Thu, 22 Aug 2024 at 23:33, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> > > I wonder if we need to invent a function to compare two PathTargets.
> >
> > Wouldn't the normal node equal() work?
>
> It might. I think has_volatile_expr might be missing a
> pg_node_attr(equal_ignore).
Yeah, maybe we can make the node equal() work for PathTarget. We'll
need to remove the no_equal attribute in PathTarget. I think we also
need to specify pg_node_attr(equal_ignore) for PathTarget.cost.
BTW, I'm wondering why we specify no_copy for PathTarget, while
meanwhile implementing a separate function copy_pathtarget() in
tlist.c to copy a PathTarget. Can't we support copyObject() for
PathTarget?
Also the pg_node_attr(array_size(exprs)) attribute for
PathTarget.sortgrouprefs does not seem right to me. In a lot of cases
sortgrouprefs would just be NULL. Usually it is valid only for
upper-level Paths. Hmm, maybe this is why we do not support
copyObject() for PathTarget?
Thanks
Richard
From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-23 03:08:47 |
Message-ID: | CAMbWs49QVFUdgG8K-hb5dL8Hv7h2rsEi+u32W7VdnAbsOJcT8A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Aug 22, 2024 at 3:34 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> /* Add projection step if needed */
> if (sorted_path->pathtarget != target)
> sorted_path = apply_projection_to_path(root, ordered_rel,
> sorted_path, target);
>
> This does not seem right to me, as PathTargets are not canonical, so
> we cannot guarantee that two identical PathTargets will have the same
> pointer. Actually, for the query above, the two PathTargets are
> identical but have different pointers.
FWIW, the redundant-projection issue is more common in practice than I
initially thought. For a simple query as below:
explain (verbose, costs off) select a from t order by 1;
QUERY PLAN
----------------------------
Sort
Output: a
Sort Key: t.a
-> Seq Scan on public.t
Output: a
(5 rows)
... we'll always make a separate ProjectionPath on top of the SortPath
in create_ordered_paths. It’s only when we create the plan node for
the projection step in createplan.c that we realize a separate Result
is unnecessary. This is not efficient.
Thanks
Richard
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-23 03:19:17 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> ... we'll always make a separate ProjectionPath on top of the SortPath
> in create_ordered_paths. It’s only when we create the plan node for
> the projection step in createplan.c that we realize a separate Result
> is unnecessary. This is not efficient.
I'm not sure you're considering "efficiency" in the right light.
In my mind, any time we can postpone work from path-creation time
to plan-creation time, we're probably winning because we create
many more paths than plans. Perhaps that's wrong in this case,
but it's not anywhere near as obvious as you suggest.
regards, tom lane
From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-23 03:48:37 |
Message-ID: | CAMbWs4_9Amakf6JWnzLuk=wfvjt35JiYVqz9prMaberRXHVO7w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Aug 23, 2024 at 11:19 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > ... we'll always make a separate ProjectionPath on top of the SortPath
> > in create_ordered_paths. It’s only when we create the plan node for
> > the projection step in createplan.c that we realize a separate Result
> > is unnecessary. This is not efficient.
>
> I'm not sure you're considering "efficiency" in the right light.
> In my mind, any time we can postpone work from path-creation time
> to plan-creation time, we're probably winning because we create
> many more paths than plans. Perhaps that's wrong in this case,
> but it's not anywhere near as obvious as you suggest.
I agree that it’s always desirable to postpone work from path-creation
time to plan-creation time. In this case, however, it’s a little
different. The projection step could actually be avoided from the
start if we perform the correct check in create_ordered_paths.
Thanks
Richard
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-23 03:56:15 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> On Fri, Aug 23, 2024 at 11:19 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm not sure you're considering "efficiency" in the right light.
> I agree that it’s always desirable to postpone work from path-creation
> time to plan-creation time. In this case, however, it’s a little
> different. The projection step could actually be avoided from the
> start if we perform the correct check in create_ordered_paths.
Well, the question is how expensive is the "correct check" compared
to what we're doing now. It might be cheaper than creating an extra
level of path node, or it might not. An important factor here is
that we'd pay the extra cost of a more complex check every time,
whether it avoids creation of an extra path node or not.
regards, tom lane
From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-23 08:27:42 |
Message-ID: | CAMbWs49zn0YiyadpfUUcZwHzePz+12cs5s4A+wW9ccVC28Rg1A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Aug 23, 2024 at 11:56 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > I agree that it’s always desirable to postpone work from path-creation
> > time to plan-creation time. In this case, however, it’s a little
> > different. The projection step could actually be avoided from the
> > start if we perform the correct check in create_ordered_paths.
>
> Well, the question is how expensive is the "correct check" compared
> to what we're doing now. It might be cheaper than creating an extra
> level of path node, or it might not. An important factor here is
> that we'd pay the extra cost of a more complex check every time,
> whether it avoids creation of an extra path node or not.
Fair point. After looking at the code for a while, I believe it is
sufficient to compare PathTarget.exprs after we've checked that the
two targets have different pointers.
The sorted_path here should have projected the correct target required
by the preceding steps of sort, i.e. sort_input_target. We need to
determine whether this target matches final_target. If this target is
the same pointer as sort_input_target, a simple pointer comparison, as
the current code does, is sufficient, because if no post-sort
projection is needed, sort_input_target will always be equal to
final_target.
However, sorted_path's target might not be the same pointer as
sort_input_target, because in apply_scanjoin_target_to_paths, if the
target to be applied has the same expressions as the existing
reltarget, we only inject the sortgroupref info into the existing
pathtargets, rather than create projection paths. As a result,
pointer comparison in create_ordered_paths is not reliable.
Instead, we can compare PathTarget.exprs to determine whether a
projection step is needed. If the expressions match, we can be
confident that a post-sort projection is not required.
If this conclusion is correct, I think the extra cost of comparing
PathTarget.exprs only when pointer comparison fails should be
acceptable. We have already done this for
apply_scanjoin_target_to_paths, and I think the rationale there
applies here as well:
* ... By avoiding the creation of
* projection paths we save effort both immediately and at plan creation time.
Besides, it can help avoid a Result node in the final plan in some
cases, as shown by my initial example.
Hence, I propose the attached fix. There are two ensuing plan diffs
in the regression tests, but they look reasonable and are exactly what
we are fixing here.
Thanks
Richard
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Avoid-unnecessary-post-sort-projection.patch | application/octet-stream | 2.7 KB |
From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com> |
Cc: | Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-23 12:29:03 |
Message-ID: | CAEudQAp5Dwhpuj2278KfQdK6GnrZo5xg9h7X39ZSUOG9LuHsDg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Rafia.
Em qui., 22 de ago. de 2024 às 17:17, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
escreveu:
>
>
> On Thu, 22 Aug 2024 at 15:02, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
>
>> Hi.
>>
>> Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux(at)gmail(dot)com>
>> escreveu:
>>
>>> I ran into a query plan where the Result node seems redundant to me:
>>>
>>> create table t (a int, b int, c int);
>>> insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
>>> create index on t (a, b);
>>> analyze t;
>>>
>>> set enable_hashagg to off;
>>> set enable_seqscan to off;
>>>
>>> explain (verbose, costs off)
>>> select distinct b, a from t order by a, b;
>>> QUERY PLAN
>>> ---------------------------------------------------------
>>> Result
>>> Output: b, a
>>> -> Unique
>>> Output: a, b
>>> -> Index Only Scan using t_a_b_idx on public.t
>>> Output: a, b
>>> (6 rows)
>>>
>>> What I expect is that both the Scan node and the Unique node output
>>> 'b, a', and we do not need an additional projection step, something
>>> like:
>>>
>>> explain (verbose, costs off)
>>> select distinct b, a from t order by a, b;
>>> QUERY PLAN
>>> ---------------------------------------------------
>>> Unique
>>> Output: b, a
>>> -> Index Only Scan using t_a_b_idx on public.t
>>> Output: b, a
>>> (4 rows)
>>>
>>> I looked into this a little bit and found that in function
>>> create_ordered_paths, we decide whether a projection step is needed
>>> based on a simple pointer comparison between sorted_path->pathtarget
>>> and final_target.
>>>
>>> /* Add projection step if needed */
>>> if (sorted_path->pathtarget != target)
>>> sorted_path = apply_projection_to_path(root, ordered_rel,
>>> sorted_path, target);
>>>
>>> This does not seem right to me, as PathTargets are not canonical, so
>>> we cannot guarantee that two identical PathTargets will have the same
>>> pointer. Actually, for the query above, the two PathTargets are
>>> identical but have different pointers.
>>>
>> Could memcmp solve this?
>>
>> With patch attached, using memcmp to compare the pointers.
>>
>> select distinct b, a from t order by a, b;
>> QUERY PLAN
>> ----------------------------------
>> Sort
>> Output: b, a
>> Sort Key: t.a, t.b
>> -> HashAggregate
>> Output: b, a
>> Group Key: t.a, t.b
>> -> Seq Scan on public.t
>> Output: a, b, c
>> (8 rows)
>>
>> attached patch for consideration.
>>
>> best regards,
>> Ranier Vilela
>>
>
> +1 for the idea of removing this redundant node.
> I had a look in this patch, and I was wondering if we still need
> sorted_path->pathtarget != target in the condition.
>
Although the test is unnecessary, it is cheap and avoids a possible call to
memcmp.
Thanks.
best regards,
Ranier Vilela
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-26 12:58:29 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 23.08.24 10:27, Richard Guo wrote:
> On Fri, Aug 23, 2024 at 11:56 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
>>> I agree that it’s always desirable to postpone work from path-creation
>>> time to plan-creation time. In this case, however, it’s a little
>>> different. The projection step could actually be avoided from the
>>> start if we perform the correct check in create_ordered_paths.
>>
>> Well, the question is how expensive is the "correct check" compared
>> to what we're doing now. It might be cheaper than creating an extra
>> level of path node, or it might not. An important factor here is
>> that we'd pay the extra cost of a more complex check every time,
>> whether it avoids creation of an extra path node or not.
>
> Fair point. After looking at the code for a while, I believe it is
> sufficient to compare PathTarget.exprs after we've checked that the
> two targets have different pointers.
- if (sorted_path->pathtarget != target)
+ if (sorted_path->pathtarget != target &&
+ !equal(sorted_path->pathtarget->exprs, target->exprs))
sorted_path = apply_projection_to_path(root, ordered_rel,
equal() already checks whether both pointers are equal, so I think this
could be simplified to just
if (!equal(sorted_path->pathtarget->exprs, target->exprs))
From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-27 03:29:52 |
Message-ID: | CAMbWs4_JtkpPsF8+Xh=StbzTd3KqGeUgmuDiucyJhfG2e3=3EA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Aug 26, 2024 at 8:58 PM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> On 23.08.24 10:27, Richard Guo wrote:
> > Fair point. After looking at the code for a while, I believe it is
> > sufficient to compare PathTarget.exprs after we've checked that the
> > two targets have different pointers.
>
> - if (sorted_path->pathtarget != target)
> + if (sorted_path->pathtarget != target &&
> + !equal(sorted_path->pathtarget->exprs, target->exprs))
> sorted_path = apply_projection_to_path(root, ordered_rel,
>
> equal() already checks whether both pointers are equal, so I think this
> could be simplified to just
>
> if (!equal(sorted_path->pathtarget->exprs, target->exprs))
Indeed. If the target pointers are equal, the PathTarget.exprs
pointers must be equal too.
Attached is the updated patch with this change.
Thanks
Richard
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Avoid-unnecessary-post-sort-projection.patch | application/octet-stream | 4.8 KB |
From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-27 03:43:02 |
Message-ID: | CAMbWs4_B07DUvWKo9Nv1j1DSZo4aUypcsJiv5Xqc_5yQRP=gkg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Aug 22, 2024 at 9:02 PM Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux(at)gmail(dot)com> escreveu:
>> This does not seem right to me, as PathTargets are not canonical, so
>> we cannot guarantee that two identical PathTargets will have the same
>> pointer. Actually, for the query above, the two PathTargets are
>> identical but have different pointers.
>
> Could memcmp solve this?
Hmm, I don't think memcmp works for nodes that contain pointers.
Thanks
Richard
From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Redundant Result node |
Date: | 2024-08-27 11:26:15 |
Message-ID: | CAEudQAqrO-qyMuOHJcPp4L4wLJtOuOSEduHVkfLexMTc+AC05g@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Em ter., 27 de ago. de 2024 às 00:43, Richard Guo <guofenglinux(at)gmail(dot)com>
escreveu:
> On Thu, Aug 22, 2024 at 9:02 PM Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> > Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <
> guofenglinux(at)gmail(dot)com> escreveu:
> >> This does not seem right to me, as PathTargets are not canonical, so
> >> we cannot guarantee that two identical PathTargets will have the same
> >> pointer. Actually, for the query above, the two PathTargets are
> >> identical but have different pointers.
> >
> > Could memcmp solve this?
>
> Hmm, I don't think memcmp works for nodes that contain pointers.
>
The first case which memcmp can fail is if both pointers are null.
But considering the current behavior, the cost vs benefit favors memcmp.
best regards,
Ranier Vilela