Quick Links

should we have a fast-path planning for OLTP starjoins?

Lists:	pgsql-hackers

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 14:00:49
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

While running benchmarks for my "performance over 20 years" talk [1],
I've been been also looking for common cases that don't perform well
(and thus might be a good topic for optimization, with significant
speedup helping a lot of deployments).

One such simple example that I ran into is "OLTP starjoin". You're
probably familiar with star schema in the DSS field [2], as a large fact
table with many small-ish dimensions. The OLTP variant is exactly the
same thing, but with selective WHERE conditions on the fact table.

So you can imagine it as a query of this shape:

SELECT * FROM fact_table f
JOIN dim1 ON (f.id1 = dim1.id)
JOIN dim2 ON (f.id2 = dim2.id)
JOIN dim3 ON (f.id3 = dim3.id)
...
WHERE f.id = 2398723;

This is a surprisingly common query pattern in OLTP applications, thanks
to normalization. For example the "fact" may be a table of transactions
with some basic common details, dimensions are additional "details" for
special types of transactions. When loading info about a transaction of
unknown type, this allows you to load everything at once.

Or maybe the fact table is "users" and the dimensions have all kinds of
info about the user (address, primary e-mail address, balance, ...).

Anyway, this pattern is quite common, yet it performs quite poorly.
Let's join a fact table with 10 dimensions - see the attached create
script to build such schema, and the test.sql script for pgbench.

On my new ryzen machine, this peaks at about ~16k tps with 16 clients.
The machine can easily do 1M tps in read-only pgbench, for example. And
if you increase the join_collapse_limit to 12 (because the default 8 is
not enough for the 10 dimensions), the throughput drops to ~2k tps.
That's not great.

AFAIK this is a consequence of the star joins allowing arbitrary join
order of the dimensions - those only have join conditions to the fact
relation, so it allows many join orders. So exploring them takes a lot
of time, of course.

But for starjoins, a lot of this is not really needed. In the simplest
case (no conditions on dimensions etc) it does not really matter in what
order we join those, and filters on dimensions make it only a little bit
more complicated (join the most selective first).

So I've been wondering how difficult would it be to have a special
fast-path mode for starjoins, completely skipping most of this. I
cobbled together a WIP/PoC patch (attached) on the way from FOSDEM, and
it seems to help quite a bit.

I definitely don't claim the patch is correct for all interesting cases,
just for the example query. And I'm sure there's plenty of things to fix
or improve (e.g. handling of outer joins, more complex joins, ...).

But these are the rough results for 1 and 16 clients:

build 1 16
--------------------------------------
master 1600 16000
patched 4400 46000

So that about triples the throughput. If you bump join_collapse_limit to
12, it gets even clearer

build 1 16
--------------------------------------
master 200 2000
patched 4500 48000

That's a 20x improvement - not bad. Sure, this is on a tiny dataset, and
with larger data sets it might need to do I/O, diminishing the benefits.
It's just an example to demonstrate the benefits.

If you want to try the patch, there's a new GUC enable_starjoin to
enable this optimization (off by default).

The patch does roughly this:

1) It tries to detect a "star join" before doing the full join order
search. It simply looks for the largest relation (not considering the
conditions), and assumes it's a fact. And then it searches for relations
that only join to the fact - those are the dimensions.

2) With the relations found in (1) it just builds the join relations
directly (one per level), without exploring all the possibilities. This
is where the speedup comes from.

3) If there are additional relations, those are then left to the regular
join order search algorithm.

There's a lot of stuff that could / should be improved on the current
patch. For (1) we might add support for more complex cases with
snowflake schemas [3] or with multiple fact tables. At the same time (1)
needs to be very cheap, so that it does not regress every non-starjoin
query.

For (2) it might pick a particular order we join the dimensions (by
size, selectivity, ...), and it might consider whether to join them
before/after the other relations.

FWIW I suspect there's a fair amount of research papers looking at
starjoins and what is the optimal plan for such queries, but I didn't
have time to look at that yet. Pointers welcome!

But the bigger question is whether it makes sense to have such fast-path
modes for certain query shapes. The patch "hard-codes" the planning for
starjoin queries, but we clearly can't do that for every possible join
shape (because then why have dynamic join search at all?).

I do think starjoins might be sufficiently unique / special to justify
this, but maybe it would be possible to instead improve the regular join
order search to handle this case better? I don't have a very clear idea
what would that look like, though :-(

I did check what do some other databases do, and they often have some
sort of "hint" to nudge the let the optimizer know this is a starjoin.

I also looked at what are the main bottlenecks with the simpler starjoin
planning enabled - see the attached flamegraphs. The optimizations seem
to break the stacktraces a bit, so there's a svg for "-O0 -ggdb3" too,
that doesn't have this issue (the shape is different, but the conclusion
are about the same).

In both cases about 40% of the time is spent in initial_cost_mergejoin,
which seems like a lot - and yes, disabling mergejoin doubles the
throughput. And most of the cost is in get_actual_variable_range,
looking up the range in the btrees. That seems like a lot, considering
the indexes are perfectly clean (we used to have problems with deleted
tuples, but this is not the case). I wonder if maybe we could start
caching this kind of info somewhere.

regards

[1]
https://www.postgresql.eu/events/pgconfeu2024/schedule/session/5585-performance-archaeology/

[2] https://en.wikipedia.org/wiki/Star_schema

[3] https://en.wikipedia.org/wiki/Snowflake_schema

--
Tomas Vondra

Attachment	Content-Type	Size
starjoin.sql	application/sql	452 bytes
starjoin-create.sql	application/sql	2.3 KB
0001-WIP-simplified-planning-of-starjoins.patch	text/x-patch	11.6 KB
starjoin-optim.png	image/png	375.9 KB
starjoin-no-optim.png	image/png	272.0 KB

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 19:43:39
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote:
> This is a surprisingly common query pattern in OLTP applications,
> thanks
> to normalization.

+1. Creating a small lookup table should be encouraged rather than
penalized.

Your test data includes a fact table with 10k rows and no index on the
filter condition. In OLTP applications the fact table might often fit
in memory, but I'd still expect it to have an index on the filter
condition. That might not change your overall point, but I'm curious
why you constructed the test that way?

> There's a lot of stuff that could / should be improved on the current
> patch. For (1) we might add support for more complex cases with
> snowflake schemas [3] or with multiple fact tables. At the same time
> (1)
> needs to be very cheap, so that it does not regress every non-
> starjoin
> query.

The patch only considers the largest table as the fact table, which is
a good heuristic of course. However, I'm curious if other approaches
might work. For instance, could we consider the table involved in the
most join conditions to be the fact table?

If you base it on the join conditions rather than the size of the
table, then detection of the star join would be based purely on the
query structure (not stats), which would be nice for predictability.

> But the bigger question is whether it makes sense to have such fast-
> path
> modes for certain query shapes.

We should explore what kinds of surprising cases it might create, or
what maintenance headaches might come up with future planner changes.
But the performance numbers you posted suggest that we should do
something here.

Regards,
Jeff Davis

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 20:06:53
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/4/25 20:43, Jeff Davis wrote:
> On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote:
>> This is a surprisingly common query pattern in OLTP applications,
>> thanks
>> to normalization.
>
> +1. Creating a small lookup table should be encouraged rather than
> penalized.
>
> Your test data includes a fact table with 10k rows and no index on the
> filter condition. In OLTP applications the fact table might often fit
> in memory, but I'd still expect it to have an index on the filter
> condition. That might not change your overall point, but I'm curious
> why you constructed the test that way?
>

No particular reason. I think I intended to make it a lookup by PK
(which would match the use case examples), and I forgot about that. But
yeah, I would expect an index too.

>
>> There's a lot of stuff that could / should be improved on the current
>> patch. For (1) we might add support for more complex cases with
>> snowflake schemas [3] or with multiple fact tables. At the same time
>> (1)
>> needs to be very cheap, so that it does not regress every non-
>> starjoin
>> query.
>
> The patch only considers the largest table as the fact table, which is
> a good heuristic of course. However, I'm curious if other approaches
> might work. For instance, could we consider the table involved in the
> most join conditions to be the fact table?
>
> If you base it on the join conditions rather than the size of the
> table, then detection of the star join would be based purely on the
> query structure (not stats), which would be nice for predictability.
>

Right, there may be other (possibly better) ways to detect the star join
shape. I was thinking about also requiring for foreign keys on the join
clauses - in DWH systems FKeys are sometimes omitted, which would break
the heuristics, but in OLTP it's common to still have them.

I think the cost of the heuristic will be an important metric - I don't
know if the number of join conditions is more expensive to determine
than what the patch does now, though.

>> But the bigger question is whether it makes sense to have such fast-
>> path
>> modes for certain query shapes.
>
> We should explore what kinds of surprising cases it might create, or
> what maintenance headaches might come up with future planner changes.
> But the performance numbers you posted suggest that we should do
> something here.
>

Yes, it seems like an interesting opportunity for starjoin queries. It's
a pretty common query pattern, but it also happens to be very expensive
to plan because the dimensions can be reordered almost arbitrarily.

regards

--
Tomas Vondra

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 20:23:56
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tomas Vondra <tomas(at)vondra(dot)me> writes:
> On 2/4/25 20:43, Jeff Davis wrote:
>> If you base it on the join conditions rather than the size of the
>> table, then detection of the star join would be based purely on the
>> query structure (not stats), which would be nice for predictability.

> Right, there may be other (possibly better) ways to detect the star join
> shape. I was thinking about also requiring for foreign keys on the join
> clauses - in DWH systems FKeys are sometimes omitted, which would break
> the heuristics, but in OLTP it's common to still have them.

I think you need to insist on foreign keys. Otherwise you don't know
whether the joins will eliminate fact-table rows. If that's a
possibility then it's no longer sensible to ignore different join
orders.

I'm kind of imagining a planner rule like "if table X is joined to
using a match of a foreign-key column to its PK (so that the join
removes no rows from the other table) and there are not other
restriction conditions on table X, then force X to be joined last.
And if there are multiple such tables X, it doesn't matter what
order they are joined in as long as they're last."

The interesting thing about this is we pretty much have all the
infrastructure for detecting such FK-related join conditions
already. Possibly the join order forcing could be done with
existing infrastructure too (by manipulating the joinlist).

regards, tom lane

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 20:34:23
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/4/25 09:00, Tomas Vondra wrote:
> There's a lot of stuff that could / should be improved on the current
> patch. For (1) we might add support for more complex cases with
> snowflake schemas [3] or with multiple fact tables. At the same time (1)
> needs to be very cheap, so that it does not regress every non-starjoin
> query.
>
> For (2) it might pick a particular order we join the dimensions (by
> size, selectivity, ...), and it might consider whether to join them
> before/after the other relations.
>
> FWIW I suspect there's a fair amount of research papers looking at
> starjoins and what is the optimal plan for such queries, but I didn't
> have time to look at that yet. Pointers welcome!
>
> But the bigger question is whether it makes sense to have such fast-path
> modes for certain query shapes. The patch "hard-codes" the planning for
> starjoin queries, but we clearly can't do that for every possible join
> shape (because then why have dynamic join search at all?).

+ /*
+ * Try simplified planning for starjoin. If it succeeds, we should
+ * continue at level startlev.
+ */
+ startlev = starjoin_join_search(root, initial_rels, 2);

(I should probably don a flame retardant suit, but...)

This sounds like an interesting idea, but it makes me wonder if we
should have a more generic mechanism here so that if "some pattern is
matched" then "use some simplified planning method" -- of which the
starjoin is the first and builtin example, but allowing for others to be
plugged in via extensions.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Joe Conway <mail(at)joeconway(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 20:56:27
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/4/25 21:34, Joe Conway wrote:
> On 2/4/25 09:00, Tomas Vondra wrote:
>> There's a lot of stuff that could / should be improved on the current
>> patch. For (1) we might add support for more complex cases with
>> snowflake schemas [3] or with multiple fact tables. At the same time (1)
>> needs to be very cheap, so that it does not regress every non-starjoin
>> query.
>>
>> For (2) it might pick a particular order we join the dimensions (by
>> size, selectivity, ...), and it might consider whether to join them
>> before/after the other relations.
>>
>> FWIW I suspect there's a fair amount of research papers looking at
>> starjoins and what is the optimal plan for such queries, but I didn't
>> have time to look at that yet. Pointers welcome!
>>
>> But the bigger question is whether it makes sense to have such fast-path
>> modes for certain query shapes. The patch "hard-codes" the planning for
>> starjoin queries, but we clearly can't do that for every possible join
>> shape (because then why have dynamic join search at all?).
>
> +    /*
> +     * Try simplified planning for starjoin. If it succeeds, we should
> +     * continue at level startlev.
> +     */
> +    startlev = starjoin_join_search(root, initial_rels, 2);
>
> (I should probably don a flame retardant suit, but...)
>
> This sounds like an interesting idea, but it makes me wonder if we
> should have a more generic mechanism here so that if "some pattern is
> matched" then "use some simplified planning method" -- of which the
> starjoin is the first and builtin example, but allowing for others to be
> plugged in via extensions.
>

We already have join_search_hook_type. I haven't used that in the PoC,
because I wanted to use joinrels.c functions defined as static, etc.

The main challenge would be handling queries that have multiple of such
patterns. The current hook is expected to process the whole list, while
what we'd need is more like splitting the list into chunks (one chunk
per query pattern), and then calling the hooks to handle the chunks in
some order.

But I don't think the patch should be required to invent this. We don't
even have an example of a second pattern.

regards

--
Tomas Vondra

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 21:42:24
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/4/25 21:23, Tom Lane wrote:
> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>> On 2/4/25 20:43, Jeff Davis wrote:
>>> If you base it on the join conditions rather than the size of the
>>> table, then detection of the star join would be based purely on the
>>> query structure (not stats), which would be nice for predictability.
>
>> Right, there may be other (possibly better) ways to detect the star join
>> shape. I was thinking about also requiring for foreign keys on the join
>> clauses - in DWH systems FKeys are sometimes omitted, which would break
>> the heuristics, but in OLTP it's common to still have them.
>
> I think you need to insist on foreign keys. Otherwise you don't know
> whether the joins will eliminate fact-table rows. If that's a
> possibility then it's no longer sensible to ignore different join
> orders.
>

Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
of these star join queries with LEFT JOIN too, and then the FKs are not
needed. All you need is a PK / unique index on the other side.

Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
to make this work for most cases, and then the rest would simply use the
regular join order algorithm.

I was thinking that if we allow the dimensions to eliminate rows in the
fact table, we'd simply join them starting from the most selective ones.
But that doesn't work if the joins might have different per-row costs
(e.g. because some dimensions are much larger etc). Doing something
smarter would likely end up fairly close to the regular join order
algorithm ...

> I'm kind of imagining a planner rule like "if table X is joined to
> using a match of a foreign-key column to its PK (so that the join
> removes no rows from the other table) and there are not other
> restriction conditions on table X, then force X to be joined last.
> And if there are multiple such tables X, it doesn't matter what
> order they are joined in as long as they're last."
>

I think it'd need to be a bit smarter, to handle (a) snowflake schemas
and (b) additional joins referencing the starjoin result.

The (a) shouldn't be too hard, except that it needs to check the
'secondary dimension' is also joined by FK and has no restrictions, and
then do that join later.

For (b), I don't have numbers but I've seen queries that first do a
starjoin and then add more data to that, e.g. by joining to a
combination of attributes from multiple dimensions (think region +
payment type). Or by joining to some "summary" table that does not have
an explicit FK. Still, we could leave at least some of the joins until
the very end, I guess. But even for the dimensions joined earlier the
order does not really matter.

I think (a) is something we should definitely handle. (b) is more a
DWH/BI thing, not really an OLTP query (which is what this thread is about).

> The interesting thing about this is we pretty much have all the
> infrastructure for detecting such FK-related join conditions
> already. Possibly the join order forcing could be done with
> existing infrastructure too (by manipulating the joinlist).
>

Maybe, interesting. I've ruled out relying on the FKeys early in the
coding, but I'm sure there's infrastructure the patch could use. It'd
still need to check the transitive FK relationships for snowflake joins
to work, ofc. Which is not something we need to consider right now.

What kind of "manipulation" of the joinlist you have in mind?

regards

--
Tomas Vondra

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 21:55:25
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tomas Vondra <tomas(at)vondra(dot)me> writes:
>> The interesting thing about this is we pretty much have all the
>> infrastructure for detecting such FK-related join conditions
>> already. Possibly the join order forcing could be done with
>> existing infrastructure too (by manipulating the joinlist).

> Maybe, interesting. I've ruled out relying on the FKeys early in the
> coding, but I'm sure there's infrastructure the patch could use.

It would be very sad to do that work twice in a patch that purports
to reduce planning time. If it's done too late to suit you now,
could we move it to happen earlier?

> What kind of "manipulation" of the joinlist you have in mind?

Right now, if we have four tables to join, we have a joinlist
(A B C D). (Really they're integer relids, but let's use names here.)
If we decide to force C to be joined last, it should be sufficient to
convert this to ((A B D) C). If C and D both look like candidates for
this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
This is pretty much the same thing that happens if you set
join_collapse_limit to 1 and use JOIN syntax to force a join order.
In fact, IIRC we start out with nested joinlists and there is some
code that normally flattens them until it decides it'd be creating
too large a sub-problem. I'm suggesting selectively reversing the
flattening.

regards, tom lane

From:	Jim Nasby <jnasby(at)upgrade(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 22:28:47
Message-ID:	CAMFBP2pP3Lizcbisp6Q_7Y1_vo4MkywxUkZ=MXt7LcBxGn_GhA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 4, 2025 at 3:42 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:

> On 2/4/25 21:23, Tom Lane wrote:
> > Tomas Vondra <tomas(at)vondra(dot)me> writes:
> >> On 2/4/25 20:43, Jeff Davis wrote:
> >>> If you base it on the join conditions rather than the size of the
> >>> table, then detection of the star join would be based purely on the
> >>> query structure (not stats), which would be nice for predictability.
> >
> >> Right, there may be other (possibly better) ways to detect the star join
> >> shape. I was thinking about also requiring for foreign keys on the join
> >> clauses - in DWH systems FKeys are sometimes omitted, which would break
> >> the heuristics, but in OLTP it's common to still have them.
> >
> > I think you need to insist on foreign keys. Otherwise you don't know
> > whether the joins will eliminate fact-table rows. If that's a
> > possibility then it's no longer sensible to ignore different join
> > orders.
>
> Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
> of these star join queries with LEFT JOIN too, and then the FKs are not
> needed. All you need is a PK / unique index on the other side.
>
> Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
> to make this work for most cases, and then the rest would simply use the
> regular join order algorithm.
>
> I was thinking that if we allow the dimensions to eliminate rows in the
> fact table, we'd simply join them starting from the most selective ones.
> But that doesn't work if the joins might have different per-row costs
> (e.g. because some dimensions are much larger etc). Doing something
> smarter would likely end up fairly close to the regular join order
> algorithm ...
>

As long as the join is still happening there doesn't appear to be a
correctness issue here, so I'm not sure mandating FKs makes sense.

The reason this matters is that highly concurrent FK checks can get VERY
expensive (due to the cost of creating multiXacts). While it'd be great to
fix that issue the reality today is it's not uncommon for people to remove
FKs because of the high performance penalty.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jim Nasby <jnasby(at)upgrade(dot)com>
Cc:	Tomas Vondra <tomas(at)vondra(dot)me>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-04 22:49:58
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby <jnasby(at)upgrade(dot)com> writes:
> On Tue, Feb 4, 2025 at 3:42 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
>> to make this work for most cases, and then the rest would simply use the
>> regular join order algorithm.

> As long as the join is still happening there doesn't appear to be a
> correctness issue here, so I'm not sure mandating FKs makes sense.
> The reason this matters is that highly concurrent FK checks can get VERY
> expensive (due to the cost of creating multiXacts). While it'd be great to
> fix that issue the reality today is it's not uncommon for people to remove
> FKs because of the high performance penalty.

Meh. If we don't apply this optimization when there's no FK, we have
not made those folks' life any worse. If we apply it despite there
being no FK, we might choose a materially worse plan than before, and
that *will* make their lives worse.

regards, tom lane

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-05 08:23:40
Message-ID:	CAMbWs4-dursjLejtFs=16z=kAmZ8Ph1t0GLMxNoE_o4UVLnAEw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 5, 2025 at 5:42 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
> of these star join queries with LEFT JOIN too, and then the FKs are not
> needed. All you need is a PK / unique index on the other side.
>
> Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
> to make this work for most cases, and then the rest would simply use the
> regular join order algorithm.
>
> I was thinking that if we allow the dimensions to eliminate rows in the
> fact table, we'd simply join them starting from the most selective ones.
> But that doesn't work if the joins might have different per-row costs
> (e.g. because some dimensions are much larger etc). Doing something
> smarter would likely end up fairly close to the regular join order
> algorithm ...

Yeah, we need to ensure that the joins to the fact table don't affect
its rows; otherwise, the join order matters for the final query plan,
and we'd better run the regular join search algorithm in this case.

For inner joins, using the foreign key seems ideal for this. For left
joins, we might be able to leverage rel_is_distinct_for() to handle
that.

Thanks
Richard

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tomas Vondra <tomas(at)vondra(dot)me>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-05 08:27:33
Message-ID:	CAMbWs49A0Twaby+PNbvLeTN4sAxWDW+pu7-BALz-Kk8cmUaw2Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 5, 2025 at 5:55 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Right now, if we have four tables to join, we have a joinlist
> (A B C D). (Really they're integer relids, but let's use names here.)
> If we decide to force C to be joined last, it should be sufficient to
> convert this to ((A B D) C). If C and D both look like candidates for
> this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
> This is pretty much the same thing that happens if you set
> join_collapse_limit to 1 and use JOIN syntax to force a join order.
> In fact, IIRC we start out with nested joinlists and there is some
> code that normally flattens them until it decides it'd be creating
> too large a sub-problem. I'm suggesting selectively reversing the
> flattening.

This should not be too difficult to implement. Outer joins seem to
add some complexity, though. We need to ensure that the resulting
joins in each sub-list are legal given the query's join order
constraints. For example, if we make the joinlist be (((A B) C) D),
we need to ensure that the A/B join and the (A/B)/C join are legal.

Thanks
Richard

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-05 12:22:55
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/5/25 09:27, Richard Guo wrote:
> On Wed, Feb 5, 2025 at 5:55 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Right now, if we have four tables to join, we have a joinlist
>> (A B C D). (Really they're integer relids, but let's use names here.)
>> If we decide to force C to be joined last, it should be sufficient to
>> convert this to ((A B D) C). If C and D both look like candidates for
>> this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
>> This is pretty much the same thing that happens if you set
>> join_collapse_limit to 1 and use JOIN syntax to force a join order.
>> In fact, IIRC we start out with nested joinlists and there is some
>> code that normally flattens them until it decides it'd be creating
>> too large a sub-problem. I'm suggesting selectively reversing the
>> flattening.
>
> This should not be too difficult to implement. Outer joins seem to
> add some complexity, though. We need to ensure that the resulting
> joins in each sub-list are legal given the query's join order
> constraints. For example, if we make the joinlist be (((A B) C) D),
> we need to ensure that the A/B join and the (A/B)/C join are legal.
>

If the requirement is that all "dimensions" only join to the fact table
(which in this example would be "A" I think) through a FK, then why
would these joins be illegal?

We'd also need to require either an outer (left) join, or "NOT NULL" on
the fact table side, right? IIRC we already do that when using the FKeys
for join estimates.

Essentially, this defines a "dimension" as a relation that is joined
through a PK, without any other restrictions, both of which seems fairly
simple to check, and it's a "local" feature. And we'd simply mark those
as "join at the very end, in arbitrary order". Easy enough, I guess.

I'm thinking about some more complex cases:

(a) Query with multiple starjoins (a special case of that is snowflake
schema) - but I guess this is not too difficult, it just needs to
consider the FKs as "transitive" (a bit like Dijkstra's algorithm). In
the worst case we might need to "split" the whole query into multiple
smaller subproblems.

(b) Joining additional stuff to the dimensions (not through a FK,
possibly to multiple dimensions, ...). Imagine a "diamond join" with
some summary table, etc. IMHO this is a fairly rare case / expensive
enough to make the planning part less important.

I'm also wondering how this should interact with join_collapse_limit. It
would seems ideal to do this optimization before splitting the join list
into subproblems (so that the "dimensions" are do not even count against
the limit, pretty much). But that would mean join_collapse_limit can no
longer be used to enforce a join order like today ...

regards

--
Tomas Vondra

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-05 22:57:07
Message-ID:	CADkLM=ePzKqJxNdcUZ+kLxSDLDh3BNNvMwGzB0H8XwBQWh09Aw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
>
> Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
> of these star join queries with LEFT JOIN too, and then the FKs are not
> needed. All you need is a PK / unique index on the other side.

Indeed, many installations specifically _remove_ foreign keys because of
the dreaded RI check on delete. Basically, if you delete one or more rows
in dim1 then the referencing fact1 must be scanned to ensure that it does
not contain a reference to the deleted row. Often the referencing field on
fact1 is not indexed, because the index is almost never useful in an actual
select query, so even if you did index it several unused index metrics will
identify it as a candidate for deletion. What you get is one sequential
scan of fact1 for every row deleted from dim1. Now, we could get around
this by changing how we do delete RI checks, either by moving to statement
level triggers or bypassing triggers entirely, but until we do so, it is
likely that many customers avoid otherwise useful FK references.

From:	James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-07 19:11:24
Message-ID:	CAJVSvF5TLJCF3uc3Cua5uGdK4bvfGtmTPrqVpwu=2n2wKk9UTA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 5, 2025 at 4:23 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> If the requirement is that all "dimensions" only join to the fact table
> (which in this example would be "A" I think) through a FK, then why
> would these joins be illegal?
>
> ...
> Essentially, this defines a "dimension" as a relation that is joined
> through a PK, without any other restrictions, both of which seems fairly
> simple to check, and it's a "local" feature. And we'd simply mark those
> as "join at the very end, in arbitrary order". Easy enough, I guess.

As I understand your proposal, you want to detect queries that join a
large number, N, of tables -- which means doing an exhaustive search
of all possible join orders is expensive -- where N - 1 of the tables
do not join to each other, but join only to the Nth table.

PostgreSQL already falls back on geqo when it hits some heuristic that
says exhaustive search is too expensive, but you're proposing an
additional, better heuristic.

Say we have F JOIN D1 JOIN D2 ... JOIN D(N-1). In the example you
gave, the single-table predicate on F makes it small enough, I think,
that F will be the "build" side of any Hash Join, right? You're
assuming, I think, that the cardinality |F| = 1, after applying the
filter to F. And so, |F JOIN Dk| will be approximately 1, for any 1 <=
k < N. So then the join order does not matter. I think this is what
you mean by "OLTP star join."

For *OLAP* star joins, Oracle's Star Transformation [1] works
reasonably well, where Oracle scans D1, ..., D(N-1) first, constructs
Bloom filters, etc., and then "pushes" the N-1 joins down into the Seq
Scan on F.

So, I suggest:
1. Add an *OLTP* optimization similar to what you described, but
instead of classifying the largest table as fact F, look for the "hub"
of the star and classify it as F. And then enable your optimization if
and only if the estimated nrows for F is very small.

2. For an *OLAP* optimization, do something like Oracle's Star Transformation.

Re "OLTP" vs. "OLAP": the join order does not matter for *OLTP* star
queries, because the fact table F is *small* (post-filtering). And
because F is small, it doesn't matter so much in what order you join
the dimension tables, because the result is "likely" to be small as
well.

Tom correctly points out that you really need foreign key constraints
to ensure the previous sentence's "likely," but since your
optimization is just intended to avoid considering unnecessary join
orders, you may be able to get away with asking the optimizer what it
thinks the cardinality |(... (F JOIN D1) ... JOIN Dk)| would be, and
just fall back on the existing join-search logic when the optimizer
thinks that Dk will create lots of rows (and so the join order
matters...).

So much for the OLTP case. For completeness, some discussion about the
OLAP case; fwiw, let me start by plugging my "credentials" [2].

The OLAP case is more complicated than the OLTP case, in that the bad
thing about *OLAP* star joins is that joins are pairwise. With OLAP
star joins, you assume that |F| is always much larger than |Dk|, and
by extension |(... (F JOIN D1) ... JOIN Dk)| is generally larger than
|D(k+1)|. And the problem for OLAP is that while every Dk potentially
filters rows out from F, you have to join to the Dk's one at a time,
so you never get as much filtering as you'd like.

For OLAP, you can take the Cartesian product of D1, ..., DN , and then
scan F to aggregate into the resulting cube; see [3] . (Link [2] is
related to transformation.)

Or, you can scan D1, ..., DN first, without joining anything,
constructing Hash tables and Bloom filters from your scans; then push
the Bloom filters down to the scan of F; and finally join the
(Bloom-filtered) F back to D1, ..., DN. This is what link [1]
describes. Note that [1] came out before [3].

However... for OLAP, you see from the above discussion that it's not
compilation that takes too long, but rather execution. So the
optimizations require significant changes to the SQL executor.

What you're proposing, IIUC, is a nice optimization to compilation
times, which is why (I think) you're focused on the OLTP use case. In
that case, I suggest focusing on an OLTP-specific solution, maybe a
straw man like:
1. I see a query where N-1 relations join to the Nth relation, but not
to each other (except transitively, of course).
2. Estimated cardinality for F, after pushing down single table
predicates, is very small.
3. OK, let's start joining tables D1, ..., D(N-1) in order, since
we're assuming (thanks to (1) and (2)) that the join order won't
matter.
4. Continue joining tables in this fixed (arbitrary) order, unless we
come to a Dk where the optimizer thinks joining to Dk will generate a
significant number of rows.
5. Either we join all tables in order (fast compilation!); or we hit
the case in (4), so we just fall back on the existing join logic.

Thanks,
James

[1] https://blogs.oracle.com/optimizer/post/optimizer-transformations-star-transformation
[2] https://patents.google.com/patent/US20150088856A1/en
[3] https://docs.oracle.com/en/database/oracle/oracle-database/12.2/inmem/optimizing-in-memory-aggregation.html

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-07 20:09:00
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/7/25 20:11, James Hunter wrote:
> On Wed, Feb 5, 2025 at 4:23 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> If the requirement is that all "dimensions" only join to the fact table
>> (which in this example would be "A" I think) through a FK, then why
>> would these joins be illegal?
>>
>> ...
>> Essentially, this defines a "dimension" as a relation that is joined
>> through a PK, without any other restrictions, both of which seems fairly
>> simple to check, and it's a "local" feature. And we'd simply mark those
>> as "join at the very end, in arbitrary order". Easy enough, I guess.
>
> As I understand your proposal, you want to detect queries that join a
> large number, N, of tables -- which means doing an exhaustive search
> of all possible join orders is expensive -- where N - 1 of the tables
> do not join to each other, but join only to the Nth table.
>
Yes. Essentially, it reduces the size of the problem by ignoring joins
for which we know the optimal order. We know the dimensions can be
joined last, and it does not matter in which exact order we join them.

The starjoins are a bit of the "worst case" for our heuristics, because
there are no dependencies between the dimensions, and we end up
exploring the n! possible join orders, more or less. For other joins we
quickly prune the space.

> PostgreSQL already falls back on geqo when it hits some heuristic that
> says exhaustive search is too expensive, but you're proposing an
> additional, better heuristic.

True, but most people will never actually hit the GEQO, because the
default threshold are set like this:

join_collapse_limit = 8
geqo_threshold = 12

So the planner will not "create" join search problems with more than 8
relations, but geqo only kicks in at 12. Most systems run with the
default values for these GUCs, so they don't really use GEQO.

FWIW I don't know a lot about the GEQO internals, but I heard it doesn't
work all that well for the join order problem anyway. Not sure.

> Say we have F JOIN D1 JOIN D2 ... JOIN D(N-1). In the example you
> gave, the single-table predicate on F makes it small enough, I think,
> that F will be the "build" side of any Hash Join, right? You're
> assuming, I think, that the cardinality |F| = 1, after applying the
> filter to F. And so, |F JOIN Dk| will be approximately 1, for any 1 <=
> k < N. So then the join order does not matter. I think this is what
> you mean by "OLTP star join."
>

I don't think it matters very much on which side of the join the F will
end up (or if it's a hash join, it can easily be NL). It will definitely
be in the first join, though, because all other dimensions join to it
(assuming this is just a starjoin, with only fact + dimensions).

It also doesn't really matter what's the exact cardinality of |F|. The
example used a PK lookup, so that would be 1 row, but the point is that
this is (much) cheaper than the planning. E.g. the planning might take
3ms while the execution only takes 1ms, etc. In the OLAP cases this is
usually not the case, because the queries are processing a lot of data
from the fact table, and the planning is negligible.

> For *OLAP* star joins, Oracle's Star Transformation [1] works
> reasonably well, where Oracle scans D1, ..., D(N-1) first, constructs
> Bloom filters, etc., and then "pushes" the N-1 joins down into the Seq
> Scan on F.
>

I don't care about OLAP star joins, at least no in this patch. It's a
completely different / separate use case, and it affects very different
parts of the planner (and also the executor, which this patch does not
need to touch at all).

> So, I suggest:
> 1. Add an *OLTP* optimization similar to what you described, but
> instead of classifying the largest table as fact F, look for the "hub"
> of the star and classify it as F. And then enable your optimization if
> and only if the estimated nrows for F is very small.
>

Right. I believe this is mostly what looking for FKs (as suggested by
Tom) would end up doing. It doesn't need to consider the cardinality of
F at all.

> 2. For an *OLAP* optimization, do something like Oracle's Star
> Transformation.
>

I consider that well outside the scope of this patch.

> Re "OLTP" vs. "OLAP": the join order does not matter for *OLTP* star
> queries, because the fact table F is *small* (post-filtering). And
> because F is small, it doesn't matter so much in what order you join
> the dimension tables, because the result is "likely" to be small as
> well.
>

I don't think that's quite true. The order of dimension joins does not
matter because the joins do not affect the join size at all. The size of
|F| has nothing to do with that, I think. We'll do the same number of
lookups against the dimensions no matter in what order we join them. And
we know it's best to join them as late as possible, after all the joins
that reduce the size (and before joins that "add" rows, I think).

> Tom correctly points out that you really need foreign key constraints
> to ensure the previous sentence's "likely," but since your
> optimization is just intended to avoid considering unnecessary join
> orders, you may be able to get away with asking the optimizer what it
> thinks the cardinality |(... (F JOIN D1) ... JOIN Dk)| would be, and
> just fall back on the existing join-search logic when the optimizer
> thinks that Dk will create lots of rows (and so the join order
> matters...).
>

Possibly, but TBH the join cardinality estimates can be quite dubious
pretty easily. The FK is a much more reliable (definitive) information.

> So much for the OLTP case. For completeness, some discussion about the
> OLAP case; fwiw, let me start by plugging my "credentials" [2].
>

Thanks ;-)

> The OLAP case is more complicated than the OLTP case, in that the bad
> thing about *OLAP* star joins is that joins are pairwise. With OLAP
> star joins, you assume that |F| is always much larger than |Dk|, and
> by extension |(... (F JOIN D1) ... JOIN Dk)| is generally larger than
> |D(k+1)|. And the problem for OLAP is that while every Dk potentially
> filters rows out from F, you have to join to the Dk's one at a time,
> so you never get as much filtering as you'd like.
>
> For OLAP, you can take the Cartesian product of D1, ..., DN , and then
> scan F to aggregate into the resulting cube; see [3] . (Link [2] is
> related to transformation.)
>
> Or, you can scan D1, ..., DN first, without joining anything,
> constructing Hash tables and Bloom filters from your scans; then push
> the Bloom filters down to the scan of F; and finally join the
> (Bloom-filtered) F back to D1, ..., DN. This is what link [1]
> describes. Note that [1] came out before [3].
>
> However... for OLAP, you see from the above discussion that it's not
> compilation that takes too long, but rather execution. So the
> optimizations require significant changes to the SQL executor.
>

Agreed. I'm not against improving the OLAP case too, but it's not what
this thread was about. It seems it'll need changes in very different
places, etc.

> What you're proposing, IIUC, is a nice optimization to compilation
> times, which is why (I think) you're focused on the OLTP use case. In
> that case, I suggest focusing on an OLTP-specific solution, maybe a
> straw man like:
> 1. I see a query where N-1 relations join to the Nth relation, but not
> to each other (except transitively, of course).
> 2. Estimated cardinality for F, after pushing down single table
> predicates, is very small.
> 3. OK, let's start joining tables D1, ..., D(N-1) in order, since
> we're assuming (thanks to (1) and (2)) that the join order won't
> matter.
> 4. Continue joining tables in this fixed (arbitrary) order, unless we
> come to a Dk where the optimizer thinks joining to Dk will generate a
> significant number of rows.
> 5. Either we join all tables in order (fast compilation!); or we hit
> the case in (4), so we just fall back on the existing join logic.
>

Yes, I think that's pretty much the idea. Except that I don't think we
need to look at the |F| at all - it will have more impact for small |F|,
of course, but it doesn't hurt for large |F|.

I think it'll probably need to consider which joins increase/decrease
the cardinality, and "inject" the dimension joins in between those.

regards

--
Tomas Vondra

From:	James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-07 22:43:12
Message-ID:	CAJVSvF7yu0fypZPymFRBJks4_Fg_TakxVgd5gJVNvCx8+0tdmw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 7, 2025 at 12:09 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> ...
> Yes, I think that's pretty much the idea. Except that I don't think we
> need to look at the |F| at all - it will have more impact for small |F|,
> of course, but it doesn't hurt for large |F|.
>
> I think it'll probably need to consider which joins increase/decrease
> the cardinality, and "inject" the dimension joins in between those.

YMMV, but I suspect you may find it much easier to look at |F|, |F
JOIN D1|, |(F JOIN D1) JOIN D2|, etc., than to consider |F JOIN D1| /
|F|, etc. (In other words, I suspect that considering absolute
cardinalities will end up easier/cleaner than considering ratios of
increases/decreases in cardinalities.) But I have not thought about
this much, so I am not putting too much weight on my suspicions.

James

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-07 23:29:11
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/7/25 23:43, James Hunter wrote:
> On Fri, Feb 7, 2025 at 12:09 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> ...
>> Yes, I think that's pretty much the idea. Except that I don't think we
>> need to look at the |F| at all - it will have more impact for small |F|,
>> of course, but it doesn't hurt for large |F|.
>>
>> I think it'll probably need to consider which joins increase/decrease
>> the cardinality, and "inject" the dimension joins in between those.
>
> YMMV, but I suspect you may find it much easier to look at |F|, |F
> JOIN D1|, |(F JOIN D1) JOIN D2|, etc., than to consider |F JOIN D1| /
> |F|, etc. (In other words, I suspect that considering absolute
> cardinalities will end up easier/cleaner than considering ratios of
> increases/decreases in cardinalities.) But I have not thought about
> this much, so I am not putting too much weight on my suspicions.
>

That's not what I meant when I mentioned joins that increase/decrease
cardinality. That wasn't referring to the "dimension" joins, which we
expect to have FK and thus should not affect the cardinality at all.

Instead, I was thinking about the "other" joins (if there are any), that
may add or remove rows. AFAIK we want to join the dimensions at the
place with the lowest cardinality - the discussion mostly assumed the
joins would only reduce the cardinality, in which case we'd just leave
the dimensions until the very end.

But ISTM that may not be necessarily true. Let's say there's a join that
"multiplies" each row. It'll probably be done at the end, and the
dimension joins should probably happen right before it ... not sure.

cheers

--
Tomas Vondra

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-08 00:23:17
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tomas Vondra <tomas(at)vondra(dot)me> writes:
> Instead, I was thinking about the "other" joins (if there are any), that
> may add or remove rows. AFAIK we want to join the dimensions at the
> place with the lowest cardinality - the discussion mostly assumed the
> joins would only reduce the cardinality, in which case we'd just leave
> the dimensions until the very end.

> But ISTM that may not be necessarily true. Let's say there's a join that
> "multiplies" each row. It'll probably be done at the end, and the
> dimension joins should probably happen right before it ... not sure.

I thought the idea here was to get rid of as much join order searching
as we could. Insisting that we get the best possible plan anyway
seems counterproductive, not to mention very messy to implement.
So I'd just push all these joins to the end and be done with it.

regards, tom lane

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-08 01:49:57
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/8/25 01:23, Tom Lane wrote:
> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>> Instead, I was thinking about the "other" joins (if there are any), that
>> may add or remove rows. AFAIK we want to join the dimensions at the
>> place with the lowest cardinality - the discussion mostly assumed the
>> joins would only reduce the cardinality, in which case we'd just leave
>> the dimensions until the very end.
>
>> But ISTM that may not be necessarily true. Let's say there's a join that
>> "multiplies" each row. It'll probably be done at the end, and the
>> dimension joins should probably happen right before it ... not sure.
>
> I thought the idea here was to get rid of as much join order searching
> as we could. Insisting that we get the best possible plan anyway
> seems counterproductive, not to mention very messy to implement.
> So I'd just push all these joins to the end and be done with it.
>

Right, cutting down on the join order searching is the point. But most
of the savings comes (I think) from not considering different ordering
of the dimensions, because those are all essentially the same.

Consider a join with 16 relations, 10 of which are dimensions. There are
10! possible orders of the dimensions, but all of them behave pretty
much exactly the same. In a way, this behaves almost like a join with 7
relations, one of which represents all the 10 dimensions.

I think this "join group" abstraction (a relation representing a bunch
of relations in a particular order) would make this reasonably clean to
implement. I haven't tried yet, though.

Yes, this means we'd explore more orderings, compared to just pushing
all the dimensions to the end. In the example above, that'd be 7!/6!, so
up to ~7x orderings.

I don't know if this is worth the extra complexity, of course.

thanks

--
Tomas Vondra

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-10 01:32:47
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've not done anything like this with joins before, but I AFAICS the
interesting stuff happens in deconstruct_recurse(), especially close to
the end where we check join_collapse_limit and do

joinlist = list_make2(leftpart, rightpart);

So I guess one way to "reverse the flattening" could be to do something
in deconstruct_recourse(). But I don't think that'd work all that well,
because of the recursion. We don't want to add a "pipeline break" into
the join list, we want to move the "dimension" to the end - even if only
within the group defined by join_collapse_limit.

E.g. imagine we have a join of 8 relations, with F (fact), dimensions D1
and D2, and then some artibrary tables T1, T2, T3, T4, T5. And let's say
deconstruct_recurse() sees them in this particular order

[F, T1, T2, D1, D2, T3, T4, T5]

AFAICS doing something in deconstruct_recurse() would likely split the
joinlist into four parts

[F, T1, T2] [D1] [D2] [T3, T4, T5]

which does treat the D1,D2 as if join_collapse_limit=1, but it also
splits the remaining relations into two groups, when we'd probably want
something more like this:

[F, T1, T2, T3, T4, T5] [D1] [D2]

Which should be legal, because a requirement is that D1/D2 don't have
any other join restrictions (I guess this could be relaxed a bit to only
restrictions within that particular group).

Which leads me to the conclusion that the best place to do this kind of
stuff is deconstruct_jointree(), once we have the "complete" joinlist.
We could walk it and reorder/split some of the joinlists again.

regards

--
Tomas Vondra

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-10 07:29:19
Message-ID:	CAMbWs4_sNDrRncOT4XVycAxdm7q-0U+4JDOq7Y_v7FTQG7LDwg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 10, 2025 at 9:32 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> E.g. imagine we have a join of 8 relations, with F (fact), dimensions D1
> and D2, and then some artibrary tables T1, T2, T3, T4, T5. And let's say
> deconstruct_recurse() sees them in this particular order
>
> [F, T1, T2, D1, D2, T3, T4, T5]
>
> AFAICS doing something in deconstruct_recurse() would likely split the
> joinlist into four parts
>
> [F, T1, T2] [D1] [D2] [T3, T4, T5]
>
> which does treat the D1,D2 as if join_collapse_limit=1, but it also
> splits the remaining relations into two groups, when we'd probably want
> something more like this:
>
> [F, T1, T2, T3, T4, T5] [D1] [D2]
>
> Which should be legal, because a requirement is that D1/D2 don't have
> any other join restrictions (I guess this could be relaxed a bit to only
> restrictions within that particular group).

Hmm, I'm still a little concerned about whether the resulting joins
are legal. Suppose we have a join pattern like the one below.

F left join
(D1 inner join T on true) on F.b = D1.b
left join D2 on F.c = D2.c;

For this query, the original joinlist is [F, D1, T, D2]. If we
reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to
produce any joins, as the F/T join is not legal.

This may not be the pattern we are targeting. But if we intend to
support it, I think we need a way to ensure that the resulting joins
are legal.

Thanks
Richard

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-10 09:35:31
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/10/25 08:29, Richard Guo wrote:
> On Mon, Feb 10, 2025 at 9:32 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> E.g. imagine we have a join of 8 relations, with F (fact), dimensions D1
>> and D2, and then some artibrary tables T1, T2, T3, T4, T5. And let's say
>> deconstruct_recurse() sees them in this particular order
>>
>> [F, T1, T2, D1, D2, T3, T4, T5]
>>
>> AFAICS doing something in deconstruct_recurse() would likely split the
>> joinlist into four parts
>>
>> [F, T1, T2] [D1] [D2] [T3, T4, T5]
>>
>> which does treat the D1,D2 as if join_collapse_limit=1, but it also
>> splits the remaining relations into two groups, when we'd probably want
>> something more like this:
>>
>> [F, T1, T2, T3, T4, T5] [D1] [D2]
>>
>> Which should be legal, because a requirement is that D1/D2 don't have
>> any other join restrictions (I guess this could be relaxed a bit to only
>> restrictions within that particular group).
>
> Hmm, I'm still a little concerned about whether the resulting joins
> are legal. Suppose we have a join pattern like the one below.
>
> F left join
> (D1 inner join T on true) on F.b = D1.b
> left join D2 on F.c = D2.c;
>
> For this query, the original joinlist is [F, D1, T, D2]. If we
> reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to
> produce any joins, as the F/T join is not legal.
>
> This may not be the pattern we are targeting. But if we intend to
> support it, I think we need a way to ensure that the resulting joins
> are legal.
>

It's quite possible the PoC patch I posted fails to ensure this, but I
think the assumption is we'd not reorder joins for dimensions that any
any join order restrictions (except for the FK join).

regards

--
Tomas Vondra

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-10 21:36:13
Message-ID:	CA+TgmoYGU9amJt_P_H28rcZoaehJVN8ttaD5xL1p5Zb1HXX04w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 7, 2025 at 3:09 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> I don't think that's quite true. The order of dimension joins does not
> matter because the joins do not affect the join size at all. The size of
> |F| has nothing to do with that, I think. We'll do the same number of
> lookups against the dimensions no matter in what order we join them. And
> we know it's best to join them as late as possible, after all the joins
> that reduce the size (and before joins that "add" rows, I think).

This is often not quite true, because there are often restriction
clauses on the fact tables that result in some rows being eliminated.

e.g. SELECT * FROM hackers h JOIN languages l ON h.language_id = l.id
JOIN countries c ON h.country_id = c.id WHERE c.name = 'Czechia';

However, I think that trying to somehow leverage the existence of
either FK or LJ+UNIQUE is still a pretty good idea. In a lot of cases,
many of the joins don't change the row count, so we don't really need
to explore all possible orderings of those joins. We might be able to
define some concept of "join that does't change the row count at all"
or maybe better "join that doesn't change the row count very much".
And then if we have a lot of such joins, we can consider them as a
group. Say we have 2 joins that do change the row count significantly,
and then 10 more than don't. The 10 that don't can be done before,
between, or after the two that do, but it doesn't seem necessary to
consider doing some of them at one point and some at another.

Maybe that's not the right way to think about this problem; I haven't
read the academic literature on star-join optimization. But it has
always felt stupid to me that we spend a bunch of time considering
join orders that are not meaningfully different, and I think what
makes two join orders not meaningfully different is that we're
commuting joins that are not changing the row count.

(Also worth noting: even joins of this general form change the row
count, they can only reduce it.)

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-11 09:28:47
Message-ID:	CAMbWs49DZdSF1NSiD3GD26yb6+uLOwt49HH8UccNcvUP_XfGog@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 10, 2025 at 5:35 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> On 2/10/25 08:29, Richard Guo wrote:
> > Hmm, I'm still a little concerned about whether the resulting joins
> > are legal. Suppose we have a join pattern like the one below.
> >
> > F left join
> > (D1 inner join T on true) on F.b = D1.b
> > left join D2 on F.c = D2.c;
> >
> > For this query, the original joinlist is [F, D1, T, D2]. If we
> > reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to
> > produce any joins, as the F/T join is not legal.
> >
> > This may not be the pattern we are targeting. But if we intend to
> > support it, I think we need a way to ensure that the resulting joins
> > are legal.

> It's quite possible the PoC patch I posted fails to ensure this, but I
> think the assumption is we'd not reorder joins for dimensions that any
> any join order restrictions (except for the FK join).

Then, we'll need a way to determine if a given relation has join-order
restrictions, which doesn't seem like a trivial task. We do have the
has_join_restriction() function, but it considers any relations
involved in an outer join as having join restrictions, and that makes
it unsuitable for our needs here.

Thanks
Richard

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-11 15:14:04
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/10/25 22:36, Robert Haas wrote:
> On Fri, Feb 7, 2025 at 3:09 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> I don't think that's quite true. The order of dimension joins does not
>> matter because the joins do not affect the join size at all. The size of
>> |F| has nothing to do with that, I think. We'll do the same number of
>> lookups against the dimensions no matter in what order we join them. And
>> we know it's best to join them as late as possible, after all the joins
>> that reduce the size (and before joins that "add" rows, I think).
>
> This is often not quite true, because there are often restriction
> clauses on the fact tables that result in some rows being eliminated.
>
> e.g. SELECT * FROM hackers h JOIN languages l ON h.language_id = l.id
> JOIN countries c ON h.country_id = c.id WHERE c.name = 'Czechia';
>

True. I think this was discussed earlier in this thread - dimensions
with additional restrictions may affect the row count, and thus would be
exempt from this (and would instead go through the "regular" join order
search algorithm). So I assumed the "dimensions" don't have any such
restrictions in my message, I should have mentioned that.

> However, I think that trying to somehow leverage the existence of
> either FK or LJ+UNIQUE is still a pretty good idea. In a lot of cases,
> many of the joins don't change the row count, so we don't really need
> to explore all possible orderings of those joins. We might be able to
> define some concept of "join that does't change the row count at all"
> or maybe better "join that doesn't change the row count very much".
> And then if we have a lot of such joins, we can consider them as a
> group. Say we have 2 joins that do change the row count significantly,
> and then 10 more than don't. The 10 that don't can be done before,
> between, or after the two that do, but it doesn't seem necessary to
> consider doing some of them at one point and some at another.
>
> Maybe that's not the right way to think about this problem; I haven't
> read the academic literature on star-join optimization. But it has
> always felt stupid to me that we spend a bunch of time considering
> join orders that are not meaningfully different, and I think what
> makes two join orders not meaningfully different is that we're
> commuting joins that are not changing the row count.
>
> (Also worth noting: even joins of this general form change the row
> count, they can only reduce it.)
>

I searched for papers on star-joins, but pretty much everything I found
focuses on the OLAP case. Which is interesting, I think the OLTP
star-join I described seems quite different, and I'm not sure the trade
offs are necessarily the same.

My gut feeling is that in the first "phase" we should focus on the case
with no restrictions - that makes it obvious what the optimal order is,
and it will help a significant fraction of cases.

And then in the next step we can try doing something for cases with
restrictions - be it some sort of greedy algorithm, something that
leverages knowledge of the selectivity to prune join orders early
(instead of actually exploring all N! join orders like today). Or
something else, not sure.

regards

--
Tomas Vondra

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-02-11 15:29:16
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/11/25 10:28, Richard Guo wrote:
> On Mon, Feb 10, 2025 at 5:35 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> On 2/10/25 08:29, Richard Guo wrote:
>>> Hmm, I'm still a little concerned about whether the resulting joins
>>> are legal. Suppose we have a join pattern like the one below.
>>>
>>> F left join
>>> (D1 inner join T on true) on F.b = D1.b
>>> left join D2 on F.c = D2.c;
>>>
>>> For this query, the original joinlist is [F, D1, T, D2]. If we
>>> reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to
>>> produce any joins, as the F/T join is not legal.
>>>
>>> This may not be the pattern we are targeting. But if we intend to
>>> support it, I think we need a way to ensure that the resulting joins
>>> are legal.
>
>> It's quite possible the PoC patch I posted fails to ensure this, but I
>> think the assumption is we'd not reorder joins for dimensions that any
>> any join order restrictions (except for the FK join).
>
> Then, we'll need a way to determine if a given relation has join-order
> restrictions, which doesn't seem like a trivial task. We do have the
> has_join_restriction() function, but it considers any relations
> involved in an outer join as having join restrictions, and that makes
> it unsuitable for our needs here.
>

I admit knowing next to nothing about join order planning :-( Could you
maybe explain why it would be non-trivial to determine if a relation has
join-order restrictions? Surely we already determine that, no? So what
would we need to do differently?

Or are you saying that because has_join_restriction() treats each
relation with an outer join as having a restriction, that makes it
unusable for the purpose of this optimization/patch? And we'd need to
invent something more elaborate?

I'm not sure that's quite true. The problem with joining the dimensions
(with inner joins) is *exactly* the lack of restrictions, which means
that explore possible orders of those dimensions (all N! of them). With
the restrictions (e.g. from LEFT JOIN), that's no longer true - in a
way, this is similar to what the patch does. And in fact, replacing the
inner joins with LEFT JOINs makes the queries much faster. I've seen
this used as a workaround to cut down on planning time ...

So I don't think treating outer joins as "having restriction" is a
problem. It doesn't regress any queries, although it might lead to a bit
strange situation that "less restricted" joins are faster to plan.

regards

--
Tomas Vondra

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: should we have a fast-path planning for OLTP starjoins?
Date:	2025-07-28 19:44:11
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/4/25 22:55, Tom Lane wrote:
> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>>> The interesting thing about this is we pretty much have all the
>>> infrastructure for detecting such FK-related join conditions
>>> already. Possibly the join order forcing could be done with
>>> existing infrastructure too (by manipulating the joinlist).
>
>> Maybe, interesting. I've ruled out relying on the FKeys early in the
>> coding, but I'm sure there's infrastructure the patch could use.
>
> It would be very sad to do that work twice in a patch that purports
> to reduce planning time. If it's done too late to suit you now,
> could we move it to happen earlier?
>
>> What kind of "manipulation" of the joinlist you have in mind?
>
> Right now, if we have four tables to join, we have a joinlist
> (A B C D). (Really they're integer relids, but let's use names here.)
> If we decide to force C to be joined last, it should be sufficient to
> convert this to ((A B D) C). If C and D both look like candidates for
> this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
> This is pretty much the same thing that happens if you set
> join_collapse_limit to 1 and use JOIN syntax to force a join order.
> In fact, IIRC we start out with nested joinlists and there is some
> code that normally flattens them until it decides it'd be creating
> too large a sub-problem. I'm suggesting selectively reversing the
> flattening.
>
> regards, tom lane

Here's a patch trying to do it more like this - by manipulating the
lists describing the join problems, before it's passed the the actual
join search algorithm (which is where the PoC patch did this).

I wonder if this is roughly the place where you imagined this would be
done, or if you envision some other issue with this approach. The patch
is definitely incomplete, there's plenty of XXX comments about places
missing some code, etc.

I initially tried to manipulate the joinlist much earlier - pretty much
right at the end of deconstruct_jointree. But that turned out to be way
too early. To identify dimensions etc. we need to check stuff about
foreign keys, join clauses, ... and that's not available that early.

So I think this needs to happen much later in query_planner(), and the
patch does it right before the make_one_rel() call. Maybe that's too
late, but it needs to happen after match_foreign_keys_to_quals(), as it
relies on some of the FK info built by that call. Maybe we could call
match_foreign_keys_to_quals() earlier, but I don't quite see any
benefits of doing that ...

On 2/8/25 02:49, Tomas Vondra wrote:
> On 2/8/25 01:23, Tom Lane wrote:
>> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>>> Instead, I was thinking about the "other" joins (if there are any), that
>>> may add or remove rows. AFAIK we want to join the dimensions at the
>>> place with the lowest cardinality - the discussion mostly assumed the
>>> joins would only reduce the cardinality, in which case we'd just leave
>>> the dimensions until the very end.
>>
>>> But ISTM that may not be necessarily true. Let's say there's a join that
>>> "multiplies" each row. It'll probably be done at the end, and the
>>> dimension joins should probably happen right before it ... not sure.
>>
>> I thought the idea here was to get rid of as much join order searching
>> as we could. Insisting that we get the best possible plan anyway
>> seems counterproductive, not to mention very messy to implement.
>> So I'd just push all these joins to the end and be done with it.
>>
>
> Right, cutting down on the join order searching is the point. But most
> of the savings comes (I think) from not considering different ordering
> of the dimensions, because those are all essentially the same.
>
> Consider a join with 16 relations, 10 of which are dimensions. There are
> 10! possible orders of the dimensions, but all of them behave pretty
> much exactly the same. In a way, this behaves almost like a join with 7
> relations, one of which represents all the 10 dimensions.
>
> I think this "join group" abstraction (a relation representing a bunch
> of relations in a particular order) would make this reasonably clean to
> implement. I haven't tried yet, though.
>
> Yes, this means we'd explore more orderings, compared to just pushing
> all the dimensions to the end. In the example above, that'd be 7!/6!, so
> up to ~7x orderings.
>
> I don't know if this is worth the extra complexity, of course.
>

I'm still concerned about regressions when happen to postpone some
expensive dimension joins after a join that increases the cardinality.
And the "join group" would address that. But It probably is not a very
common query pattern, and it'd require changes to join_search_one_level.

I'm not sure what could / should count as 'dimension'. The patch doesn't
implement this yet, but I think it can mostly copy how we match the FK
to the join in get_foreign_key_join_selectivity. There's probably more
to think about, though. For example, don't we need to check NOT NULL /
nullfrac on the referencing side? Also, it probably interacts with
LEFT/OUTER joins. I plan to start strict and then relax and handle some
additional cases.

I'm however struggling with the concept of join order restrictions a
bit. I suspect we need to worry about that when walking the relation
list and figuring out what can be a dimension, but I've never worked
with this, so my mental model of how this works is not great.

Another question is if this planning shortcut (which for the dimensions
mostly picks a random join order) could have some unexpected impact on
the rest of the planning. For example, could we "miss" some join
producing tuples in an interesting order? Or could we fail to consider a
partition-wise join?

Could this "shortcut" restrict the rest of the plan in some undesirable
way? For example, if some of the tables are partitioned, could this mean
we no longer can do partition-wise joins with the (mostly arbitrary)
join order we picked?

There's also a "patch" directory, with some SQL scripts creating two
simple examples of schemas/queries that would benefit from this. The
"create-1/select-1" examples are the simple starjoins, this thread
focuses on. It might be modified to do "snowflake" join, which is
fundamentally a variant of this query type.

The second example (create-2/select-2) is quite different, in that it's
nor a starjoin schema. It still joins one "main" table with multiple
"dimensions", but the FKs go in the other direction (to a single column
in the main table). But it has a very similar bottleneck - the order of
the joins is expensive, but it often does not matter very much, because
the query matches just one row anyway. And even if it returns more rows,
isn't the join order determined just by the selectivity of each join?
Maybe the starjoin optimization could be made to work for this type too?

regards

--
Tomas Vondra

Attachment	Content-Type	Size
v2-0001-Simplify-planning-of-starjoin-queries.patch	text/x-patch	23.5 KB