Re: CHECKPOINT unlogged data

Lists: pgsql-hackers
From: Christoph Berg <myon(at)debian(dot)org>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: CHECKPOINT unlogged data
Date: 2025-05-30 16:17:45
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

A customer reported to use CHECKPOINT before shutdowns to make the
shutdowns themselves faster and asked if it was possible to make
CHECKPOINT optionally also write out unlogged table data for that
purpose. I think the idea makes sense, so there's the patch.

Christoph

Attachment Content-Type Size
v1-0001-Add-immediate-and-flush_all-options-to-checkpoint.patch text/x-diff 8.5 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-05-30 16:39:02
Message-ID: vp4cewuyo2cw4fk5sxjaovllhe3dscixxo2utu3cgqhzydrqec@wc2nhgx4khpk
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2025-05-30 18:17:45 +0200, Christoph Berg wrote:
> A customer reported to use CHECKPOINT before shutdowns to make the
> shutdowns themselves faster and asked if it was possible to make
> CHECKPOINT optionally also write out unlogged table data for that
> purpose. I think the idea makes sense, so there's the patch.

I've seen the need for this quite a few times, fwiw. So +1 to the idea.

Greetings,

Andres Freund


From: Christoph Berg <myon(at)debian(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-05-30 16:55:28
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Andres Freund
> Hi,
>
> On 2025-05-30 18:17:45 +0200, Christoph Berg wrote:
> > A customer reported to use CHECKPOINT before shutdowns to make the
> > shutdowns themselves faster and asked if it was possible to make
> > CHECKPOINT optionally also write out unlogged table data for that
> > purpose. I think the idea makes sense, so there's the patch.
>
> I've seen the need for this quite a few times, fwiw. So +1 to the idea.

Do we want to officially mention this use case in checkpoint.sgml? I've
already replaced the "is not intended for use during normal operation"
wording by "not required during normal operation", but we might go one
step further and endorse it.

Christoph


From: Andres Freund <andres(at)anarazel(dot)de>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-05-30 17:02:41
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On May 30, 2025 12:55:28 PM EDT, Christoph Berg <myon(at)debian(dot)org> wrote:
>Re: Andres Freund
>> Hi,
>>
>> On 2025-05-30 18:17:45 +0200, Christoph Berg wrote:
>> > A customer reported to use CHECKPOINT before shutdowns to make the
>> > shutdowns themselves faster and asked if it was possible to make
>> > CHECKPOINT optionally also write out unlogged table data for that
>> > purpose. I think the idea makes sense, so there's the patch.
>>
>> I've seen the need for this quite a few times, fwiw. So +1 to the idea.
>
>Do we want to officially mention this use case in checkpoint.sgml? I've
>already replaced the "is not intended for use during normal operation"
>wording by "not required during normal operation", but we might go one
>step further and endorse it.

Yes, I think it's good to mention what it is useful for.

Greetings,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-05-30 17:07:48
Message-ID: aDnl5N1u7UBYVaSt@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 30, 2025 at 12:39:02PM -0400, Andres Freund wrote:
> On 2025-05-30 18:17:45 +0200, Christoph Berg wrote:
>> A customer reported to use CHECKPOINT before shutdowns to make the
>> shutdowns themselves faster and asked if it was possible to make
>> CHECKPOINT optionally also write out unlogged table data for that
>> purpose. I think the idea makes sense, so there's the patch.
>
> I've seen the need for this quite a few times, fwiw. So +1 to the idea.

This patch also adds an IMMEDIATE option, which I proposed some time ago
[0]. I ended up withdrawing it due to general skepticism about its
usefulness. FWIW I have no concerns about adding a few retail options to
CHECKPOINT, but others might balk at options without solid use-cases. The
unlogged table one seems reasonable enough.

[0] https://postgr.es/m/17A03557-CF5C-4D4B-B719-A1D98DD75E75%40amazon.com

--
nathan


From: Christoph Berg <myon(at)debian(dot)org>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-05-30 17:23:04
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Nathan Bossart
> This patch also adds an IMMEDIATE option, which I proposed some time ago
> [0]. I ended up withdrawing it due to general skepticism about its

Thanks for the pointer, I did not go that far back when looking for
older threads.

When writing the patch, I was also thinking about naming the option
"fast" or "spread" but ultimately went with "immediate" because that's
what the log message is using:

=# checkpoint;
2025-05-30 18:23:17.433 CEST [579834] LOG: Checkpoint beginnt: immediate force wait

SQL command "(options)" tend to be booleans, hence "immediate {on|off}".
Introducing two separate keywords "fast" and "spread" seemed
confusing, and there is no precedent for "fast=on" in other tools or
the replication protocol.

> usefulness. FWIW I have no concerns about adding a few retail options to
> CHECKPOINT, but others might balk at options without solid use-cases. The
> unlogged table one seems reasonable enough.

I think the two options immediate and flush_all are actually useful in
combination for the shutdown case. If operation is to continue
normally until just before the shutdown, it might make sense to run
these 3 commands (or just #1 and #3):

checkpoint (flush_all, immediate false);
checkpoint (flush_all);
pg_ctl stop

(I also thought about a VERBOSE option, but since the checkpoint
messages are generated by a different process, it's probably harder
than I initially thought.)

Christoph


From: Andres Freund <andres(at)anarazel(dot)de>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-05-30 17:29:08
Message-ID: 7x6gsmev36zzh6lfzydkddbhpnxqgu2k2l2ci2k6foxhrstvj2@bn55g54wjrl4
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 2025-05-30 19:23:04 +0200, Christoph Berg wrote:
> Re: Nathan Bossart
> > This patch also adds an IMMEDIATE option, which I proposed some time ago
> > [0]. I ended up withdrawing it due to general skepticism about its
>
> Thanks for the pointer, I did not go that far back when looking for
> older threads.
>
> When writing the patch, I was also thinking about naming the option
> "fast" or "spread" but ultimately went with "immediate" because that's
> what the log message is using:
>
> =# checkpoint;
> 2025-05-30 18:23:17.433 CEST [579834] LOG: Checkpoint beginnt: immediate force wait
>
> SQL command "(options)" tend to be booleans, hence "immediate {on|off}".
> Introducing two separate keywords "fast" and "spread" seemed
> confusing, and there is no precedent for "fast=on" in other tools or
> the replication protocol.

I'd add a 'mode' that can be set to an arbitrary string, which then can be
validated in C code. That seems more future proof.


From: Christoph Berg <myon(at)debian(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-06 10:03:28
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Andres Freund
> I'd add a 'mode' that can be set to an arbitrary string, which then can be
> validated in C code. That seems more future proof.

Changed in the attached v2, thanks.

Christoph

Attachment Content-Type Size
v2-0001-Add-mode-and-flush_all-options-to-checkpoint.patch text/x-diff 10.1 KB

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-06 10:31:25
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 2025/06/06 19:03, Christoph Berg wrote:
> Re: Andres Freund
>> I'd add a 'mode' that can be set to an arbitrary string, which then can be
>> validated in C code. That seems more future proof.
>
> Changed in the attached v2, thanks.

When I applied the patch and compiled it, I got the following warnings:

utility.c:946:4: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
946 | CheckPointStmt *stmt = (CheckPointStmt *) parsetree;
| ^
utility.c:947:16: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
947 | ListCell *lc;
| ^
2 warnings generated.

RequestCheckpoint(CHECKPOINT_WAIT |
+ (immediate ? CHECKPOINT_IMMEDIATE : 0) |
+ (flush_all ? CHECKPOINT_FLUSH_ALL : 0) |

Some users might want to trigger a spread checkpoint but not wait for
it to finish, since it could take a long time? If that's a valid use case,
maybe we should add a WAIT option to let users choose whether to wait for
the checkpoint to complete or not?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation


From: Christoph Berg <myon(at)debian(dot)org>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-06 14:26:56
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Fujii Masao
> utility.c:946:4: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]

Thanks, my compiler didn't throw that. { } block added in v3.

> RequestCheckpoint(CHECKPOINT_WAIT |
> + (immediate ? CHECKPOINT_IMMEDIATE : 0) |
> + (flush_all ? CHECKPOINT_FLUSH_ALL : 0) |
>
> Some users might want to trigger a spread checkpoint but not wait for
> it to finish, since it could take a long time? If that's a valid use case,
> maybe we should add a WAIT option to let users choose whether to wait for
> the checkpoint to complete or not?

Do we want that? The checkpoint is only effective when it's finished,
and running `psql -c "checkpoint (wait false)"` might make people
shoot themselves into the foot.

Christoph

Attachment Content-Type Size
v3-0001-Add-mode-and-flush_all-options-to-checkpoint.patch text/x-diff 10.7 KB

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-06 15:26:36
Message-ID: aEMIrLUDOKgmE_0P@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 06, 2025 at 04:26:56PM +0200, Christoph Berg wrote:
> Re: Fujii Masao
>> Some users might want to trigger a spread checkpoint but not wait for
>> it to finish, since it could take a long time? If that's a valid use case,
>> maybe we should add a WAIT option to let users choose whether to wait for
>> the checkpoint to complete or not?
>
> Do we want that? The checkpoint is only effective when it's finished,
> and running `psql -c "checkpoint (wait false)"` might make people
> shoot themselves into the foot.

*shrug*

I imagine the documentation will pretty clearly indicate that setting WAIT
to "false" will cause CHECKPOINT to not wait for it to finish.

> + <term><literal>MODE</literal></term>
> + <listitem>
> + <para>
> + The <command>CHECKPOINT</command> command forces an immediate
> + checkpoint by default when the command is issued, without waiting for a
> + regular checkpoint scheduled by the system. <literal>FAST</literal> is a
> + synonym for <literal>IMMEDIATE</literal>.
> + </para>
> + <para>
> + A <literal>SPREAD</literal> checkpoint will instead spread out the write load
> + as determined by the <xref linkend="guc-checkpoint-completion-target"/>
> + setting, like the system-scheduled checkpoints.
> + </para>
> + </listitem>
> + </varlistentry>

I don't understand why we need to add both FAST and IMMEDIATE.

> + <varlistentry>
> + <term><literal>FLUSH_ALL</literal></term>
> + <listitem>
> + <para>
> + Requests the checkpoint to also flush data of <literal>UNLOGGED</literal>
> + relations. Defaults to off.
> + </para>
> + </listitem>
> + </varlistentry>

Could we rename this to something like FLUSH_UNLOGGED or INCLUDE_UNLOGGED?
IMHO that's more descriptive.

My attempt at this patch back in 2020 included the following note, which
seems relevant here:

+ Note that the server may consolidate concurrently requested checkpoints or
+ restartpoints. Such consolidated requests will contain a combined set of
+ options. For example, if one session requested an immediate checkpoint and
+ another session requested a non-immediate checkpoint, the server may combine
+ these requests and perform one immediate checkpoint.

We might also want to make sure it's clear that CHECKPOINT does nothing if
there's been no database activity since the last one (or, in the case of a
restartpoint, if there hasn't been a checkpoint record).

--
nathan


From: Christoph Berg <myon(at)debian(dot)org>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-06 16:20:21
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Nathan Bossart
> I imagine the documentation will pretty clearly indicate that setting WAIT
> to "false" will cause CHECKPOINT to not wait for it to finish.

I can add it, it's easy enough...

> I don't understand why we need to add both FAST and IMMEDIATE.

We have both:

=# checkpoint ;
2025-06-06 18:09:25.743 CEST [872379] LOG: checkpoint starting: immediate force wait

pg_basebackup --checkpoint=fast

Could we settle for one official name for that? Then we could use that
name in all contexts.

> > + <term><literal>FLUSH_ALL</literal></term>
>
> Could we rename this to something like FLUSH_UNLOGGED or INCLUDE_UNLOGGED?
> IMHO that's more descriptive.

That's again coming from what the log message is saying:

=# checkpoint (flush_all);
2025-06-06 18:12:46.298 CEST [873436] LOG: checkpoint starting: immediate force wait flush-all

I think we should be consistent there.

#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those
* belonging to unlogged tables */

Maybe CHECKPOINT_FLUSH_UNLOGGED would be more explicit?

> My attempt at this patch back in 2020 included the following note, which
> seems relevant here:
>
> + Note that the server may consolidate concurrently requested checkpoints or
> + restartpoints. Such consolidated requests will contain a combined set of
> + options. For example, if one session requested an immediate checkpoint and
> + another session requested a non-immediate checkpoint, the server may combine
> + these requests and perform one immediate checkpoint.

The CHECKPOINT documentation links to `28.5. WAL Configuration`,
should this be mentioned there instead?

> We might also want to make sure it's clear that CHECKPOINT does nothing if
> there's been no database activity since the last one (or, in the case of a
> restartpoint, if there hasn't been a checkpoint record).

That's taken care of by "force":

#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */

Christoph


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-06 16:33:36
Message-ID: aEMYYEkQb2VeQoBo@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 06, 2025 at 06:20:21PM +0200, Christoph Berg wrote:
> Re: Nathan Bossart
>> I don't understand why we need to add both FAST and IMMEDIATE.
>
> We have both:
>
> =# checkpoint ;
> 2025-06-06 18:09:25.743 CEST [872379] LOG: checkpoint starting: immediate force wait
>
> pg_basebackup --checkpoint=fast
>
> Could we settle for one official name for that? Then we could use that
> name in all contexts.

That seems like a good idea to me. I'm tempted to say that "fast" more
accurately describes what's happening than "immediate." "Immediate" sounds
like it happens instantaneously, but it's actually just happening "fast,"
i.e., as fast as possible.

>> > + <term><literal>FLUSH_ALL</literal></term>
>>
>> Could we rename this to something like FLUSH_UNLOGGED or INCLUDE_UNLOGGED?
>> IMHO that's more descriptive.
>
> That's again coming from what the log message is saying:
>
> =# checkpoint (flush_all);
> 2025-06-06 18:12:46.298 CEST [873436] LOG: checkpoint starting: immediate force wait flush-all
>
> I think we should be consistent there.
>
> #define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those
> * belonging to unlogged tables */
>
> Maybe CHECKPOINT_FLUSH_UNLOGGED would be more explicit?

WFM.

>> My attempt at this patch back in 2020 included the following note, which
>> seems relevant here:
>>
>> + Note that the server may consolidate concurrently requested checkpoints or
>> + restartpoints. Such consolidated requests will contain a combined set of
>> + options. For example, if one session requested an immediate checkpoint and
>> + another session requested a non-immediate checkpoint, the server may combine
>> + these requests and perform one immediate checkpoint.
>
> The CHECKPOINT documentation links to `28.5. WAL Configuration`,
> should this be mentioned there instead?

I thought it would make sense to put it closer to where these options are
described, since it'll be most evident for manually-initiated checkpoints.

>> We might also want to make sure it's clear that CHECKPOINT does nothing if
>> there's been no database activity since the last one (or, in the case of a
>> restartpoint, if there hasn't been a checkpoint record).
>
> That's taken care of by "force":
>
> #define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */

Oh, I see that we always specify that for CHECKPOINT commands, except for
restartpoints. IIRC even if you do specify CHECKPOINT_FORCE for a
restartpoint, it'll have no effect. It's proably worth mentioning that
case, at least.

--
nathan


From: Christoph Berg <myon(at)debian(dot)org>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-11 13:45:46
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Nathan Bossart
> That seems like a good idea to me. I'm tempted to say that "fast" more
> accurately describes what's happening than "immediate." "Immediate" sounds
> like it happens instantaneously, but it's actually just happening "fast,"
> i.e., as fast as possible.

Ack.

> > #define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those
> > * belonging to unlogged tables */
> >
> > Maybe CHECKPOINT_FLUSH_UNLOGGED would be more explicit?
>
> WFM.

Do we want to change the checkpoint log message (and the new options)
only, or include the CHECKPOINT_* flags? (I would guess there aren't
many external users of these flags, but mmmv.)

> I thought it would make sense to put it closer to where these options are
> described, since it'll be most evident for manually-initiated checkpoints.

Ack, I'll add that.

> >> We might also want to make sure it's clear that CHECKPOINT does nothing if
> >> there's been no database activity since the last one (or, in the case of a
> >> restartpoint, if there hasn't been a checkpoint record).
> >
> > That's taken care of by "force":
> >
> > #define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */
>
> Oh, I see that we always specify that for CHECKPOINT commands, except for
> restartpoints. IIRC even if you do specify CHECKPOINT_FORCE for a
> restartpoint, it'll have no effect. It's proably worth mentioning that
> case, at least.

Right, will do.

Christoph


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-11 13:59:42
Message-ID: aEmLzinvZ-W8EPiW@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 11, 2025 at 03:45:46PM +0200, Christoph Berg wrote:
> Do we want to change the checkpoint log message (and the new options)
> only, or include the CHECKPOINT_* flags? (I would guess there aren't
> many external users of these flags, but mmmv.)

IMO we should try to make the terminology consistent everywhere. I'd
suggest putting the renaming stuff in separate prerequisite patches for
your new CHECKPOINT option.

--
nathan


From: Christoph Berg <myon(at)debian(dot)org>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-11 15:53:15
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Nathan Bossart
> IMO we should try to make the terminology consistent everywhere. I'd
> suggest putting the renaming stuff in separate prerequisite patches for
> your new CHECKPOINT option.

Ack, done in v4.

I haven't implemented a WAIT option yet since I didn't want to decide
that without more votes in either direction.

Christoph

Attachment Content-Type Size
v4-0001-Rename-checkpoint-options-immediate-and-flush-all.patch text/x-diff 17.7 KB
v4-0002-Add-mode-and-flush_unlogged-options-to-CHECKPOINT.patch text/x-diff 10.9 KB

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-11 21:32:35
Message-ID: aEn188NE3Yu6M0uf@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 11, 2025 at 05:53:15PM +0200, Christoph Berg wrote:
> Ack, done in v4.

Thanks! The overall shape of these patches looks pretty good to me. I'll
aim to give them a deeper review in the near future.

--
nathan


From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Christoph Berg <myon(at)debian(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-12 07:14:08
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2025-06-11 at 17:53 +0200, Christoph Berg wrote:
> I haven't implemented a WAIT option yet since I didn't want to decide
> that without more votes in either direction.

I had a look at it, and I have suggestions for the documentation.

> --- a/doc/src/sgml/ref/checkpoint.sgml
> +++ b/doc/src/sgml/ref/checkpoint.sgml
> [...]
> <para>
> - The <command>CHECKPOINT</command> command forces a fast
> - checkpoint when the command is issued, without waiting for a
> - regular checkpoint scheduled by the system (controlled by the settings in
> - <xref linkend="runtime-config-wal-checkpoints"/>).
> - <command>CHECKPOINT</command> is not intended for use during normal
> - operation.
> + Running <command>CHECKPOINT</command> is not required during normal
> + operation; the system schedules checkpoints automatically (controlled by
> + the settings in <xref linkend="runtime-config-wal-checkpoints"/>).
> + <command>CHECKPOINT</command> can be beneficial to run before operations
> + such as binary backups or shutting down <productname>PostgreSQL</productname>
> + as this can reduce the amount of data to be flushed for the next checkpoint
> + within these operations. Specifically, <literal>UNLOGGED</literal> table
> + data is otherwise only flushed by the shutdown checkpoint.
> </para>

How about the following for added clarity:

Running an explicit <command>CHECKPOINT</command> is not required during
normal operation; the system schedules checkpoints automatically (controlled
by the settings in <xref linkend="runtime-config-wal-checkpoints"/>).
However, it can be useful to perform an explicit checkpoint immediately
before shutting down the server or performing an online file system backup,
if you want the checkpoint implicit in these operations to be as fast as
possible. In particular, <literal>UNLOGGED</literal> table data only get
flushed to disk during a shutdown checkpoint, so you should use the option
<literal>FLUSH_UNLOGGED</literal> for an explicit checkpoint right before a
shutdown.

> <para>
> If executed during recovery, the <command>CHECKPOINT</command> command
> - will force a restartpoint (see <xref linkend="wal-configuration"/>)
> - rather than writing a new checkpoint.
> + will force a restartpoint rather than writing a new checkpoint. (The
> + operation will be a no-op if there is no corresponding checkpoint in the
> + write-ahead log.) If a checkpoint is already being written when a
> + <command>CHECKPOINT</command> is issued, the running checkpoint is upgraded
> + with this command's <literal>MODE</literal> and
> + <literal>FLUSH_UNLOGGED</literal> options.
> </para>

Oh, interesting; I wasn't aware of that. That means that running CHECKPOINT
(FLUSH_UNLOGGED) is less useful than I thought, since there is often already
a spread checkpoint running. Would it be useful to recommend that you should
run the command twice when aiming for a fast shutdown?

@@ -58,6 +69,54 @@ CHECKPOINT
</para>
</refsect1>

+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><literal>MODE</literal></term>
+ <listitem>
+ <para>
+ The <command>CHECKPOINT</command> command forces a fast checkpoint by
+ default when the command is issued, without waiting for a regular
+ checkpoint scheduled by the system.
+ </para>
+ <para>
+ A <literal>SPREAD</literal> checkpoint will instead spread out the write
+ load as determined by the
+ <xref linkend="guc-checkpoint-completion-target"/> setting, like the
+ system-scheduled checkpoints.
+ </para>
+ </listitem>
+ </varlistentry>

I find this somewhat confusing; how about

<para>
The default <literal>FAST</literal> mode causes the checkpoint to be
performed as fast as possible. A <literal>SPREAD</literal> checkpoint
will instead spread out the write load as determined by the
<xref linkend="guc-checkpoint-completion-target"/> setting, like the
system-scheduled checkpoints.
</para>

+ <varlistentry>
+ <term><literal>FLUSH_UNLOGGED</literal></term>
+ <listitem>
+ <para>
+ All data files except for those marked as <literal>UNLOGGED</literal>
+ are flushed to disk during a checkpoint. Enabling this option will also
+ flush <literal>UNLOGGED</literal> relations; the default is off.
+ </para>
+ </listitem>
+ </varlistentry>

How about:

<varlistentry>
<term><literal>FLUSH_UNLOGGED</literal></term>
<listitem>
<para>
Normally, data files marked as <literal>UNLOGGED</literal> are not
flushed to disk during a checkpoint. Enabling this option will also
flush <literal>UNLOGGED</literal> relations. This option is disabled
by default.
</para>
</listitem>
</varlistentry>

Yours,
Laurenz Albe


From: Christoph Berg <myon(at)debian(dot)org>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-16 14:36:59
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Laurenz Albe
> How about the following for added clarity:
>
> Running an explicit <command>CHECKPOINT</command> is not required during
> normal operation; the system schedules checkpoints automatically (controlled
> by the settings in <xref linkend="runtime-config-wal-checkpoints"/>).
> However, it can be useful to perform an explicit checkpoint immediately
> before shutting down the server or performing an online file system backup,
> if you want the checkpoint implicit in these operations to be as fast as
> possible. In particular, <literal>UNLOGGED</literal> table data only get
> flushed to disk during a shutdown checkpoint, so you should use the option
> <literal>FLUSH_UNLOGGED</literal> for an explicit checkpoint right before a
> shutdown.

Thanks for the suggestions.

I've taken this one, but replaced "be as fast as possible" by "have to
write out less data" to avoid confusion with the FAST mode.

> Oh, interesting; I wasn't aware of that. That means that running CHECKPOINT
> (FLUSH_UNLOGGED) is less useful than I thought, since there is often already
> a spread checkpoint running. Would it be useful to recommend that you should
> run the command twice when aiming for a fast shutdown?

I spent some time digging through the code, but I'm still not entirely
sure what's happening. There are several parts to it:

1) the list of buffers to flush is determined at the beginning of the
checkpoint, so running a 2nd FLUSH_UNLOGGED checkpoint will not make
the running checkpoint write these

2) running CHECKPOINT updates the checkpoint flags in shared memory so
I think the currently running checkpoint picks "MODE FAST" up and
speeds up. (But I'm not entirely sure, the call stack is quite deep
there.)

3) running CHECKPOINT (at least when waiting for it) seems to actually
start a new checkpoint, so FLUSH_UNLOGGED should still be effective.
(See the code arount "start_cv" in checkpointer.c)

Admittedly, adding these points together raises some question marks
about the flag handling, so I would welcome clarification by someone
more knowledgeable in this area.

> I find this somewhat confusing; how about
> How about:

Taken unmodified, thanks!

Christoph

Attachment Content-Type Size
v5-0001-Rename-checkpoint-options-immediate-and-flush-all.patch text/x-diff 17.7 KB
v5-0002-Add-mode-and-flush_unlogged-options-to-CHECKPOINT.patch text/x-diff 11.0 KB

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-16 15:18:41
Message-ID: aFA10XSdJ9hPGfD1@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 16, 2025 at 04:36:59PM +0200, Christoph Berg wrote:
> I spent some time digging through the code, but I'm still not entirely
> sure what's happening. There are several parts to it:
>
> 1) the list of buffers to flush is determined at the beginning of the
> checkpoint, so running a 2nd FLUSH_UNLOGGED checkpoint will not make
> the running checkpoint write these
>
> 2) running CHECKPOINT updates the checkpoint flags in shared memory so
> I think the currently running checkpoint picks "MODE FAST" up and
> speeds up. (But I'm not entirely sure, the call stack is quite deep
> there.)
>
> 3) running CHECKPOINT (at least when waiting for it) seems to actually
> start a new checkpoint, so FLUSH_UNLOGGED should still be effective.
> (See the code arount "start_cv" in checkpointer.c)
>
> Admittedly, adding these points together raises some question marks
> about the flag handling, so I would welcome clarification by someone
> more knowledgeable in this area.

I think you've got it right. With CHECKPOINT_WAIT set, RequestCheckpoint()
will wait for a new checkpoint to start, at which point we know that the
new flags have been seen by the checkpointer. If an immediate checkpoint
is pending, CheckpointWriteDelay() will skip sleeping in the
currently-running one, so the current checkpoint will be "upgraded" to
immediate in some sense, but IIUC there will still be another immediate
checkpoint after it completes. But AFAICT it doesn't pick up
FLUSH_UNLOGGED until the next checkpoint begins.

Another thing to note is what I mentioned earlier:

+ Note that the server may consolidate concurrently requested checkpoints or
+ restartpoints. Such consolidated requests will contain a combined set of
+ options. For example, if one session requested an immediate checkpoint and
+ another session requested a non-immediate checkpoint, the server may combine
+ these requests and perform one immediate checkpoint.

--
nathan


From: Christoph Berg <myon(at)debian(dot)org>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-16 15:43:54
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Nathan Bossart
> I think you've got it right. With CHECKPOINT_WAIT set, RequestCheckpoint()
> will wait for a new checkpoint to start, at which point we know that the
> new flags have been seen by the checkpointer. If an immediate checkpoint
> is pending, CheckpointWriteDelay() will skip sleeping in the
> currently-running one, so the current checkpoint will be "upgraded" to
> immediate in some sense, but IIUC there will still be another immediate
> checkpoint after it completes. But AFAICT it doesn't pick up
> FLUSH_UNLOGGED until the next checkpoint begins.

> Another thing to note is what I mentioned earlier:

Thanks. I now have this:

<para>
If a checkpoint is already running when a <command>CHECKPOINT</command>
is issued, a new checkpoint is queued. The server will consolidate multiple
concurrently requested checkpoints or restartpoints and merge their options.
For example, if one session requests a fast checkpoint and another session
requests a spread checkpoint, the server combines these requests and
performs one fast checkpoint. Queing a fast checkpoint will also switch a
currently running spread checkpoint to run fast.
</para>

Christoph

Attachment Content-Type Size
v6-0001-Rename-checkpoint-options-immediate-and-flush-all.patch text/x-diff 17.7 KB
v6-0002-Add-mode-and-flush_unlogged-options-to-CHECKPOINT.patch text/x-diff 11.3 KB

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-18 16:15:15
Message-ID: aFLmE2zl038I7-F1@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

I broke this up into several small patches. Notable changes are as
follows:

* Adjusted to the tab completion code to work more like the VACUUM utility
options.

* Introduced a new ExecCheckpoint() function in checkpointer.c and moved
the privilege check and options parsing there.

* Removed the notes in the docs about when to use the CHECKPOINT command.
I'm not opposed to adding something like that (in fact, I think it's a
good idea), but IMHO we should bikeshed on that separately, maybe even in
a new thread.

Thoughts?

--
nathan

Attachment Content-Type Size
v7-0001-rename-CHECKPOINT_FLUSH_ALL-to-CHECKPOINT_FLUSH_U.patch text/plain 4.8 KB
v7-0002-rename-CHECKPOINT_IMMEDIATE-to-CHECKPOINT_FAST.patch text/plain 16.5 KB
v7-0003-add-option-list-to-checkpoint-command.patch text/plain 6.0 KB
v7-0004-add-mode-option-to-checkpoint-command.patch text/plain 6.8 KB
v7-0005-add-flush_unlogged-option-to-checkpoint-command.patch text/plain 4.8 KB

From: Christoph Berg <myon(at)debian(dot)org>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CHECKPOINT unlogged data
Date: 2025-06-18 20:09:01
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Re: Nathan Bossart
> I broke this up into several small patches. Notable changes are as
> follows:
>
> * Adjusted to the tab completion code to work more like the VACUUM utility
> options.
>
> * Introduced a new ExecCheckpoint() function in checkpointer.c and moved
> the privilege check and options parsing there.

Ack. I was pondering if the code was growing too big there, but didn't
want to change too many things at once.

> * Removed the notes in the docs about when to use the CHECKPOINT command.
> I'm not opposed to adding something like that (in fact, I think it's a
> good idea), but IMHO we should bikeshed on that separately, maybe even in
> a new thread.

I would have thought this already happened here.

> Thoughts?

Fine with me, thanks!

Christoph