Lists: | pgsql-hackers |
---|
From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Some optimisations for numeric division |
Date: | 2022-02-23 13:34:59 |
Message-ID: | CAEZATCVwsBi-ND-t82Cuuh1=8ee6jdOpzsmGN+CUZB6yjLg9jw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Attached are 3 small patches that improve the performance of numeric
division. Patch 0002 seems to have the biggest impact, but I think
they're all worth including, since they're quite simple changes, with
noticeable performance gains.
Patch 0001 vectorises the inner loop of div_var_fast(). This loop is
almost identical to the inner loop of mul_var(), which was vectorised
by commit 8870917623. The thing preventing the div_var_fast() loop
from being vectorised is the assignment to div[qi + i], and simply
replacing that with div_qi[i] where div_qi = &div[qi], in the same
style as mul_var(), fixes that.
One difference between this and the mul_var() code is that it is also
necessary to add an explicit cast to get the compiler to generate
matching output, and give the best results. This is because in
mul_var() the compiler recognises that var1digit is actually 16-bit,
rather than 32-bit, and so it doesn't need to multiply the high part,
but in div_var_fast() it's not obvious to the compiler that qdigit
also fits in 16 bits, hence the cast.
The actual performance gain is typically quite modest, since
div_var_fast() is always only a part of some more complex numeric
computation, but it can be seen in cases like raising numbers to
negative integer powers, e.g.:
CREATE TEMP TABLE num_test(x numeric);
SELECT setseed(0);
INSERT INTO num_test
SELECT (SELECT ('1.'||string_agg((random()*9)::int::text, '')||x)::numeric
FROM generate_series(1,100))
FROM generate_series(1,100000) g(x);
SELECT sum(x^(-2)) FROM num_test;
Time: 112.949 ms (HEAD)
Time: 98.537 ms (with patch)
Patch 0002 simplifies the inner loop of div_var() (the guts of the
public-facing numeric division operator) by more closely combining the
multiplication and subtraction operations and folding the separate
"carry" and "borrow" variables into a single "borrow", as suggested by
the old code comment.
IMO, this makes the code easier to understand, as well as giving more
significant performance gains:
CREATE TEMP TABLE div_test(x numeric);
SELECT setseed(0);
INSERT INTO div_test
SELECT (SELECT ('1.'||string_agg((random()*9)::int::text, ''))::numeric + x
FROM generate_series(1,50))
FROM generate_series(1,1000) g(x);
SELECT sum(x/y) FROM div_test t1(x), div_test t2(y);
Time: 1477.340 ms (HEAD)
Time: 826.748 ms (with patch)
Patch 0003 replaces some uses of div_var_fast() with div_var().
Specifically, when the divisor has just one or two base-NBASE digits,
div_var() is faster. This is especially true for 1-digit divisors, due
to the "fast path" code in div_var() to handle that. It's also just
about true for 2-digit divisors, and it occurs to me that that case
could potentially be optimised further with similar fast path code in
div_var(), but I haven't tried that yet.
One-digit divisors are quite common. For example, in the Taylor series
expansions in exp_var() and ln_var(), since the number of Taylor
series terms never exceeds a few hundred in practice. Also,
exp_var()'s argument reduction step divides by 2^ndiv2, which is
roughly 100 times the input, rounded up to a power of two, and valid
inputs are less than 6000, so this will always be one or two digits.
Testing this with a bunch of random exp() and ln() computations I saw
a speed-up of 15-20%, and it reduced the run time of the numeric-big
regression test by around 10%, which seems worth having.
Regards,
Dean
Attachment | Content-Type | Size |
---|---|---|
0001-Apply-auto-vectorization-to-the-inner-loop-of-div_va.patch | text/x-patch | 2.0 KB |
0003-Use-div_var-instead-of-div_var_fast-when-the-divisor.patch | text/x-patch | 2.8 KB |
0002-Simplify-the-inner-loop-of-numeric-division-in-div_v.patch | text/x-patch | 2.9 KB |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Some optimisations for numeric division |
Date: | 2022-02-23 20:55:12 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> Attached are 3 small patches that improve the performance of numeric
> division. Patch 0002 seems to have the biggest impact, but I think
> they're all worth including, since they're quite simple changes, with
> noticeable performance gains.
I took a quick look through these (just eyeball, didn't try to verify
your performance statements). I'm +1 on 0001 and 0002, but 0003 feels
a bit ad-hoc. It certainly *looks* weird for the allegedly faster
function to be handing off to the allegedly slower one. I also wonder
if we're leaving anything on the table by not exploiting
div_var_fast's weaker roundoff guarantees in this case. Should we
think about a more thoroughgoing redesign of these functions' APIs?
Another idea is to only worry about the single-divisor-digit
optimization, and just copy div_var's (very small) inner loop for that
case into div_var_fast.
regards, tom lane
From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Some optimisations for numeric division |
Date: | 2022-02-23 22:46:50 |
Message-ID: | CAEZATCW7mFUKrxxje4HKzMEfLgxgp4mPMEb1qTaMuZf8JuArKA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, 23 Feb 2022 at 20:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I took a quick look through these (just eyeball, didn't try to verify
> your performance statements).
Thanks for looking!
> I'm +1 on 0001 and 0002, but 0003 feels
> a bit ad-hoc. It certainly *looks* weird for the allegedly faster
> function to be handing off to the allegedly slower one. I also wonder
> if we're leaving anything on the table by not exploiting
> div_var_fast's weaker roundoff guarantees in this case. Should we
> think about a more thoroughgoing redesign of these functions' APIs?
Hmm, I'm not sure what kind of thing you had in mind.
One thought that occurred to me was that it's a bit silly that
exp_var() and ln_var() have to use a NumericVar for what could just be
an int, if we had a div_var_int() function that could divide by an
int. Then both div_var() and div_var_fast() could hand off to it for
one and two digit divisors.
Regards,
Dean
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Some optimisations for numeric division |
Date: | 2022-02-23 22:55:02 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On Wed, 23 Feb 2022 at 20:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm +1 on 0001 and 0002, but 0003 feels
>> a bit ad-hoc. It certainly *looks* weird for the allegedly faster
>> function to be handing off to the allegedly slower one. I also wonder
>> if we're leaving anything on the table by not exploiting
>> div_var_fast's weaker roundoff guarantees in this case. Should we
>> think about a more thoroughgoing redesign of these functions' APIs?
> Hmm, I'm not sure what kind of thing you had in mind.
I'm not either, tbh. Just seems like this needs more than some
hacking around the margins.
> One thought that occurred to me was that it's a bit silly that
> exp_var() and ln_var() have to use a NumericVar for what could just be
> an int, if we had a div_var_int() function that could divide by an
> int. Then both div_var() and div_var_fast() could hand off to it for
> one and two digit divisors.
Oooh, that seems like a good idea.
regards, tom lane
From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Some optimisations for numeric division |
Date: | 2022-02-25 10:45:31 |
Message-ID: | CAEZATCXGm=DyTq=FrcOqC0gPMVveKUYTaD5KRRoajrUTiWxVMw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, 23 Feb 2022 at 22:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>
> > One thought that occurred to me was that it's a bit silly that
> > exp_var() and ln_var() have to use a NumericVar for what could just be
> > an int, if we had a div_var_int() function that could divide by an
> > int. Then both div_var() and div_var_fast() could hand off to it for
> > one and two digit divisors.
>
> Oooh, that seems like a good idea.
>
OK, I've replaced the 0003 patch with one that does that instead. The
div_var_int() API is slightly different in that it also accepts a
divisor weight argument, but the alternative would have been for the
callers to have to adjust both the result weight and rscale, which
would have been uglier.
There's a large block of code in div_var() that needs re-indenting,
but I think it would be better to leave that to a later pgindent run.
The performance results are quite pleasing. It's slightly faster than
the old one-digit div_var() code because it manages to avoid some
digit array copying, and for two digit divisors it's much faster:
CREATE TEMP TABLE div_test(x numeric, y numeric);
SELECT setseed(0);
INSERT INTO div_test
SELECT (SELECT (((x%9)+1)||string_agg((random()*9)::int::text, ''))::numeric
FROM generate_series(1,50)),
(SELECT ('1.'||string_agg((random()*9)::int::text,
'')||(x%10)||'e3')::numeric
FROM generate_series(1,6))
FROM generate_series(1,5000) g(x);
select * from div_test limit 10;
SELECT sum(t1.x/t2.y) FROM div_test t1, div_test t2;
Time: 11600.034 ms (HEAD)
Time: 9890.788 ms (with 0002)
Time: 6202.851 ms (with 0003)
And obviously it'll be a larger relative gain for div_var_fast(),
since that was slower to begin with in such cases.
This makes me think that it might also be worthwhile to follow this
with a similar div_var_int64() function on platforms with 128-bit
integers, which could then be used to handle 3- and 4-digit divisors,
which are probably quite common in practice.
Attached is the updated patch series (0001 and 0002 unchanged).
Regards,
Dean
Attachment | Content-Type | Size |
---|---|---|
0002-Simplify-the-inner-loop-of-numeric-division-in-div_v.patch | text/x-patch | 2.9 KB |
0001-Apply-auto-vectorization-to-the-inner-loop-of-div_va.patch | text/x-patch | 2.0 KB |
0003-Optimise-numeric-division-for-one-and-two-base-NBASE.patch | text/x-patch | 10.7 KB |
From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Some optimisations for numeric division |
Date: | 2022-02-25 12:43:48 |
Message-ID: | CAEZATCWuO3vm8wdp15jAiKpDxvude6c6mVEqn+AOfWwEkY4v=Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, 25 Feb 2022 at 10:45, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>
> Attached is the updated patch series (0001 and 0002 unchanged).
>
And another update following feedback from the cfbot.
Regards,
Dean
Attachment | Content-Type | Size |
---|---|---|
0001-Apply-auto-vectorization-to-the-inner-loop-of-div_va.patch | text/x-patch | 2.0 KB |
0002-Simplify-the-inner-loop-of-numeric-division-in-div_v.patch | text/x-patch | 2.9 KB |
0003-Optimise-numeric-division-for-one-and-two-base-NBASE.patch | text/x-patch | 10.7 KB |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Some optimisations for numeric division |
Date: | 2022-02-25 18:30:33 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> And another update following feedback from the cfbot.
This patchset LGTM. On my machine there doesn't seem to be any
measurable performance change for the numeric regression test,
but numeric_big gets about 15% faster.
The only additional thought I have, based on your comments about
0001, is that maybe in mul_var we should declare var1digit as
being NumericDigit not int. I tried that here and didn't see
any material change in the generated assembly code (using gcc
8.5.0), so you're right that modern gcc doesn't need any help
there; but I wonder if it could help on other compiler versions.
I'll mark this RFC.
regards, tom lane
From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Some optimisations for numeric division |
Date: | 2022-02-27 11:23:27 |
Message-ID: | CAEZATCW2zUGkWuQKMgYQzHhjc-XtXQD4B62Hhe_biUtJw2bWGA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, 25 Feb 2022 at 18:30, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> > And another update following feedback from the cfbot.
>
> This patchset LGTM. On my machine there doesn't seem to be any
> measurable performance change for the numeric regression test,
> but numeric_big gets about 15% faster.
>
Yes, that matches my observations. Thanks for reviewing and testing.
> The only additional thought I have, based on your comments about
> 0001, is that maybe in mul_var we should declare var1digit as
> being NumericDigit not int. I tried that here and didn't see
> any material change in the generated assembly code (using gcc
> 8.5.0), so you're right that modern gcc doesn't need any help
> there; but I wonder if it could help on other compiler versions.
>
Yes, that makes sense. Done that way.
Regards,
Dean