optimize several list functions with SIMD intrinsics

Lists: pgsql-hackers
From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: optimize several list functions with SIMD intrinsics
Date: 2023-03-08 00:25:02
Message-ID: 20230308002502.GA3378449@nathanxps13
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

I noticed that several of the List functions do simple linear searches that
can be optimized with SIMD intrinsics (as was done for XidInMVCCSnapshot in
37a6e5d). The following table shows the time spent iterating over a list
of n elements (via list_member_int) one billion times on my x86 laptop.

n | head (ms) | patched (ms)
------+-----------+--------------
2 | 3884 | 3001
4 | 5506 | 4092
8 | 6209 | 3026
16 | 8797 | 4458
32 | 25051 | 7032
64 | 37611 | 12763
128 | 61886 | 22770
256 | 111170 | 59885
512 | 209612 | 103378
1024 | 407462 | 189484

I've attached a work-in-progress patch that implements these optimizations
for both x86 and arm, and I will register this in the July commitfest. I'm
posting this a little early in order to gauge interest. Presumably you
shouldn't be using a List if you have many elements that must be routinely
searched, but it might be nice to speed up these functions, anyway. This
was mostly a fun weekend project, and I don't presently have any concrete
examples of workloads where this might help.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-speed-up-several-list-functions-with-SIMD.patch text/x-diff 18.8 KB

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-03-08 00:54:15
Message-ID: CAApHDvoZUGY+BBuGTMQoxwqSe9C9mPs2zZ+rUWbWqur23=LAWw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 8 Mar 2023 at 13:25, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> I've attached a work-in-progress patch that implements these optimizations
> for both x86 and arm, and I will register this in the July commitfest. I'm
> posting this a little early in order to gauge interest.

Interesting and quite impressive performance numbers.

From having a quick glance at the patch, it looks like you'll need to
take some extra time to make it work on 32-bit builds.

David


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-03-08 04:56:58
Message-ID: 20230308045658.GA3419785@nathanxps13
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 08, 2023 at 01:54:15PM +1300, David Rowley wrote:
> Interesting and quite impressive performance numbers.

Thanks for taking a look.

> From having a quick glance at the patch, it looks like you'll need to
> take some extra time to make it work on 32-bit builds.

At the moment, the support for SIMD intrinsics in Postgres is limited to
64-bit (simd.h has the details). But yes, if we want to make this work for
32-bit builds, additional work is required.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-03-08 18:58:09
Message-ID: 20230308185809.GA3552675@nathanxps13
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

cfbot's Windows build wasn't happy with a couple of casts. I applied a
fix similar to c6a43c2 in v2. The patch is still very much a work in
progress.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0001-speed-up-several-list-functions-with-SIMD.patch text/x-diff 18.9 KB

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-03-11 09:41:18
Message-ID: 167852767890.628976.10354523915450248589.pgcf@coridan.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

Hello,

Adding some review comments:

1. In list_member_ptr, will it be okay to bring `const ListCell *cell` from
#ifdef USE_NO_SIMD
const ListCell *cell;
#endif
to #else like as mentioned below? This will make visual separation between #if cases more cleaner
#else
const ListCell *cell;

foreach(cell, list)
{
if (lfirst(cell) == datum)
return true;
}

return false;

#endif

2. Lots of duplicated
if (list == NIL) checks before calling list_member_inline_internal(list, datum)
Can we not add this check in list_member_inline_internal itself?
3. if (cell)
return list_delete_cell(list, cell) in list_delete_int/oid
can we change if (cell) to if (cell != NULL) as used elsewhere?
4. list_member_inline_interal_idx , there is typo in name.
5. list_member_inline_interal_idx and list_member_inline_internal are practically same with almost 90+ % duplication.
can we not refactor both into one and allow cell or true/false as return? Although I see usage of const ListCell so this might be problematic.
6. Loop for (i = 0; i < tail_idx; i += nelem_per_iteration) for Vector32 in list.c looks duplicated from pg_lfind32 (in pg_lfind.h), can we not reuse that?
7. Is it possible to add a benchmark which shows improvement against HEAD ?

Regards,
Ankit

The new status of this patch is: Waiting on Author


From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-03-11 10:58:55
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers


> 7. Is it possible to add a benchmark which shows improvement against
HEAD ?

Please ignore this from my earlier mail, I just saw stats now.

Thanks,

Ankit


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-03-13 21:40:27
Message-ID: 20230313214027.GB423713@nathanxps13
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for taking a look.

On Sat, Mar 11, 2023 at 09:41:18AM +0000, Ankit Kumar Pandey wrote:
> 1. In list_member_ptr, will it be okay to bring `const ListCell *cell` from
> #ifdef USE_NO_SIMD
> const ListCell *cell;
> #endif
> to #else like as mentioned below? This will make visual separation between #if cases more cleaner

I would expect to see -Wdeclaration-after-statement warnings if we did
this.

> 2. Lots of duplicated
> if (list == NIL) checks before calling list_member_inline_internal(list, datum)
> Can we not add this check in list_member_inline_internal itself?

We probably could. I only extracted the NIL checks to simplify the code in
list_member_inline_internal() a bit.

> 3. if (cell)
> return list_delete_cell(list, cell) in list_delete_int/oid
> can we change if (cell) to if (cell != NULL) as used elsewhere?

Sure.

> 4. list_member_inline_interal_idx , there is typo in name.

Will fix.

> 5. list_member_inline_interal_idx and list_member_inline_internal are practically same with almost 90+ % duplication.
> can we not refactor both into one and allow cell or true/false as return? Although I see usage of const ListCell so this might be problematic.

The idea was to skip finding the exact match if all we care about is
whether the element exists. This micro-optimization might be negligible,
in which case we could use list_member_inline_internal_idx() for both
cases.

> 6. Loop for (i = 0; i < tail_idx; i += nelem_per_iteration) for Vector32 in list.c looks duplicated from pg_lfind32 (in pg_lfind.h), can we not reuse that?

The list.c version is slightly different because we need to disregard any
matches that we find in between the data. For example, in an integer List,
the integer will take up 4 bytes of the 8-byte ListCell (for 64-bit
platforms).

typedef union ListCell
{
void *ptr_value;
int int_value;
Oid oid_value;
TransactionId xid_value;
} ListCell;

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-03-15 14:01:46
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Agree with your points Nathan. Just a headup.

> On 14/03/23 03:10, Nathan Bossart wrote:
> On Sat, Mar 11, 2023 at 09:41:18AM +0000, Ankit Kumar Pandey wrote:
>> 1. In list_member_ptr, will it be okay to bring `const ListCell *cell` from
>> #ifdef USE_NO_SIMD
>> const ListCell *cell;
>> #endif
>> to #else like as mentioned below? This will make visual separation between #if cases more cleaner
> I would expect to see -Wdeclaration-after-statement warnings if we did
> this.

This worked fine for me, no warnings on gcc 12.2.0. Not a concern though.

Thanks,

Ankit


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-03-15 16:23:19
Message-ID: 20230315162319.GA609684@nathanxps13
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 15, 2023 at 07:31:46PM +0530, Ankit Kumar Pandey wrote:
>> On 14/03/23 03:10, Nathan Bossart wrote:
>> On Sat, Mar 11, 2023 at 09:41:18AM +0000, Ankit Kumar Pandey wrote:
>> > 1. In list_member_ptr, will it be okay to bring `const ListCell
>> > *cell` from #ifdef USE_NO_SIMD
>> > const ListCell *cell;
>> > #endif
>> > to #else like as mentioned below? This will make visual separation between #if cases more cleaner
>> I would expect to see -Wdeclaration-after-statement warnings if we did
>> this.
>
> This worked fine for me, no warnings on gcc 12.2.0. Not a concern though.

Did you try building without SIMD support? This is what I see:

list.c: In function ‘list_member_ptr’:
list.c:697:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
697 | const ListCell *cell;
| ^~~~~

If your build doesn't have USE_NO_SIMD defined, this warning won't appear
because the code in question will be compiled out.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-03-15 16:48:22
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers


On 15/03/23 21:53, Nathan Bossart wrote:

> Did you try building without SIMD support? This is what I see:

> list.c: In function ‘list_member_ptr’:
> list.c:697:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
> 697 | const ListCell *cell;
> | ^~~~~

> If your build doesn't have USE_NO_SIMD defined, this warning won't appear
> because the code in question will be compiled out.

My mistake, I tried with USE_NO_SIMD defined and it showed the warning. Sorry for the noise.

Regards,
Ankit


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-04-17 20:42:50
Message-ID: 20230417204250.GA1347414@nathanxps13
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Here is a new patch set. I've split it into two patches: one for the
64-bit functions, and one for the 32-bit functions. I've also added tests
for pg_lfind64/pg_lfind64_idx and deduplicated the code a bit.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v3-0001-speed-up-list_member_ptr-and-list_delete_ptr-with.patch text/x-diff 14.5 KB
v3-0002-speed-up-several-functions-for-lists-with-inline-.patch text/x-diff 6.4 KB

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-04-21 06:50:34
Message-ID: CAFBsxsF0srfW7D7Htykj+Vayp1o=uk9pUgCwWQu48rZdLgDLTw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 8, 2023 at 7:25 AM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:
>
> was mostly a fun weekend project, and I don't presently have any concrete
> examples of workloads where this might help.

It seems like that should be demonstrated before seriously considering
this, like a profile where the relevant list functions show up.

--
John Naylor
EDB: http://www.enterprisedb.com


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-04-21 20:33:58
Message-ID: 20230421203358.GB1440148@nathanxps13
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 21, 2023 at 01:50:34PM +0700, John Naylor wrote:
> On Wed, Mar 8, 2023 at 7:25 AM Nathan Bossart <nathandbossart(at)gmail(dot)com>
> wrote:
>> was mostly a fun weekend project, and I don't presently have any concrete
>> examples of workloads where this might help.
>
> It seems like that should be demonstrated before seriously considering
> this, like a profile where the relevant list functions show up.

Agreed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: optimize several list functions with SIMD intrinsics
Date: 2023-07-10 10:50:59
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 21/04/2023 23:33, Nathan Bossart wrote:
> On Fri, Apr 21, 2023 at 01:50:34PM +0700, John Naylor wrote:
>> On Wed, Mar 8, 2023 at 7:25 AM Nathan Bossart <nathandbossart(at)gmail(dot)com>
>> wrote:
>>> was mostly a fun weekend project, and I don't presently have any concrete
>>> examples of workloads where this might help.
>>
>> It seems like that should be demonstrated before seriously considering
>> this, like a profile where the relevant list functions show up.
>
> Agreed.

Grepping for "tlist_member" and "list_delete_ptr", I don't see any
callers in hot codepaths where this could make a noticeable difference.
So I've marked this as Returned with Feedback in the commitfest.

> I noticed that several of the List functions do simple linear searches that
> can be optimized with SIMD intrinsics (as was done for XidInMVCCSnapshot in
> 37a6e5d). The following table shows the time spent iterating over a list
> of n elements (via list_member_int) one billion times on my x86 laptop.
>
>
> n | head (ms) | patched (ms)
> ------+-----------+--------------
> 2 | 3884 | 3001
> 4 | 5506 | 4092
> 8 | 6209 | 3026
> 16 | 8797 | 4458
> 32 | 25051 | 7032
> 64 | 37611 | 12763
> 128 | 61886 | 22770
> 256 | 111170 | 59885
> 512 | 209612 | 103378
> 1024 | 407462 | 189484

I'm surprised to see an improvement with n=2 and n=2. AFAICS, the
vectorization only kicks in when n >= 8.

--
Heikki Linnakangas
Neon (https://neon.tech)