Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

Lists: pgsql-hackers
From: Dave Cramer <davecramer(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2018-07-24 15:52:12
Message-ID: CADK3HH+EDcP1gGXyM0sNXkkm5po7OXwyjy9XuU-beCCyWhnrPg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers
Attachment Content-Type Size
0004-Add-test-for-pg_recvlogical-to-stop-replication.patch application/octet-stream 7.2 KB
0003-Add-ability-for-pg_recvlogical-to-stop-replication-f.patch application/octet-stream 14.8 KB
0002-Client-initiated-CopyDone-during-transaction-decodin.patch application/octet-stream 11.1 KB
0001-Respect-client-initiated-CopyDone-in-walsender.patch application/octet-stream 3.0 KB

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: davecramer(at)gmail(dot)com, Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2018-11-19 15:58:45
Message-ID: CA+q6zcWhtvXT5N0ZTrsY_Z5K54fo8TuHpky+FT=9tidVHO5=yA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

> On Tue, Jul 24, 2018 at 5:52 PM Dave Cramer <davecramer(at)gmail(dot)com> wrote:
>
> Back in 2016 a patch was proposed that seems to have died on the vine. See https://www.postgresql.org/message-id/flat/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w(at)mail(dot)gmail(dot)com
> for the history and https://commitfest.postgresql.org/10/621/ for the commitfest entry.
> I've rebased the patches and attached them for consideration.
> JDBC tests here https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/replication/LogicalReplicationTest.java all pass

Unfortunately the second patch from the series can't be applied anymore, could
you please rebase it one more time? Other than that it looks strange for me
that the corresponding discussion stopped when it was quite close to be in a
good shape, bouncing from "rejected with feedback" to "needs review". Can
someone from involved people clarify what's the current status of this patch?


From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Dave Cramer <davecramer(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2018-11-30 23:01:01
Message-ID: CA+q6zcW+ukxvTMkcGcefdfQtc0ZBD5osD_QqL3iMAx3CEDfeTw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

> On Mon, Nov 19, 2018 at 4:58 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
>
> > On Tue, Jul 24, 2018 at 5:52 PM Dave Cramer <davecramer(at)gmail(dot)com> wrote:
> >
> > Back in 2016 a patch was proposed that seems to have died on the vine. See https://www.postgresql.org/message-id/flat/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w(at)mail(dot)gmail(dot)com
> > for the history and https://commitfest.postgresql.org/10/621/ for the commitfest entry.
> > I've rebased the patches and attached them for consideration.
> > JDBC tests here https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/replication/LogicalReplicationTest.java all pass
>
> Unfortunately the second patch from the series can't be applied anymore, could
> you please rebase it one more time? Other than that it looks strange for me
> that the corresponding discussion stopped when it was quite close to be in a
> good shape, bouncing from "rejected with feedback" to "needs review". Can
> someone from involved people clarify what's the current status of this patch?

Looks like I'm a failure as a neuromancer, since revival didn't happen. I'm
marking it as returned with feedback.


From: Dave Cramer <davecramer(at)gmail(dot)com>
To: 9erthalion6(at)gmail(dot)com
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2018-11-30 23:16:01
Message-ID: CADK3HHJtek=R6QEPMCvaeub64fOGhTeZ9A27LYShsaPTVCpDHA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Why is this being closed? I did not see the first email looking for
clarification.

The history is the original author dropped off the planet (no idea where he
is)

I can certainly rebase it.

Dave Cramer

On Fri, 30 Nov 2018 at 18:00, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:

> > On Mon, Nov 19, 2018 at 4:58 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
> wrote:
> >
> > > On Tue, Jul 24, 2018 at 5:52 PM Dave Cramer <davecramer(at)gmail(dot)com>
> wrote:
> > >
> > > Back in 2016 a patch was proposed that seems to have died on the vine.
> See
> https://www.postgresql.org/message-id/flat/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w(at)mail(dot)gmail(dot)com
> > > for the history and https://commitfest.postgresql.org/10/621/ for the
> commitfest entry.
> > > I've rebased the patches and attached them for consideration.
> > > JDBC tests here
> https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/replication/LogicalReplicationTest.java
> all pass
> >
> > Unfortunately the second patch from the series can't be applied anymore,
> could
> > you please rebase it one more time? Other than that it looks strange for
> me
> > that the corresponding discussion stopped when it was quite close to be
> in a
> > good shape, bouncing from "rejected with feedback" to "needs review". Can
> > someone from involved people clarify what's the current status of this
> patch?
>
> Looks like I'm a failure as a neuromancer, since revival didn't happen. I'm
> marking it as returned with feedback.
>


From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Dave Cramer <davecramer(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2018-11-30 23:23:13
Message-ID: CA+q6zcW7OuUwtZO_8aeU2wyaT1PDDu2+hUb20WBiStPiVWmRdA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 1, 2018 at 12:17 AM Dave Cramer <davecramer(at)gmail(dot)com> wrote:
>
> Why is this being closed? I did not see the first email looking for clarification.

Well, mostly due total absence of response and broken mind reading crystal ball.

> I can certainly rebase it.

Yes, please do. I'll change the CF item status back.


From: Dave Cramer <davecramer(at)gmail(dot)com>
To: 9erthalion6(at)gmail(dot)com
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2018-11-30 23:49:27
Message-ID: CADK3HHK1YpdYQ5BAksGzkScLDsBOGRGPcVeC6H8BDQEKvz=k+g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Dmitry,

Thanks, I have done a preliminary check and it seems pretty straightforward.

I will clean it up for Monday

Thanks for your patience!

Dave Cramer

On Fri, 30 Nov 2018 at 18:22, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:

> On Sat, Dec 1, 2018 at 12:17 AM Dave Cramer <davecramer(at)gmail(dot)com> wrote:
> >
> > Why is this being closed? I did not see the first email looking for
> clarification.
>
> Well, mostly due total absence of response and broken mind reading crystal
> ball.
>
> > I can certainly rebase it.
>
> Yes, please do. I'll change the CF item status back.
>


From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Dave Cramer <davecramer(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2018-11-30 23:53:21
Message-ID: CA+q6zcUq1fa09VckBbAa-xdx0gFimwNK6cq_8c9KCywVe1Cj6A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

>On Sat, Dec 1, 2018 at 12:49 AM Dave Cramer <davecramer(at)gmail(dot)com> wrote:
>
> Thanks, I have done a preliminary check and it seems pretty straightforward.
>
> I will clean it up for Monday

Great, thank you!


From: Dave Cramer <davecramer(at)gmail(dot)com>
To: 9erthalion6(at)gmail(dot)com
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2018-12-03 11:38:43
Message-ID: CADK3HHJOd4oDFrSRHmUWz12HpdCgBagwvkLjKErNmbm5MLX7VQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Dmitry,

Please see attached rebased patches

Dave Cramer

On Fri, 30 Nov 2018 at 18:52, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:

> >On Sat, Dec 1, 2018 at 12:49 AM Dave Cramer <davecramer(at)gmail(dot)com> wrote:
> >
> > Thanks, I have done a preliminary check and it seems pretty
> straightforward.
> >
> > I will clean it up for Monday
>
> Great, thank you!
>

Attachment Content-Type Size
0004-Add-test-for-pg_recvlogical-to-stop-replication.patch application/octet-stream 7.2 KB
0003-Add-ability-for-pg_recvlogical-to-stop-replication-f.patch application/octet-stream 14.8 KB
0001-Respect-client-initiated-CopyDone-in-walsender.patch application/octet-stream 2.9 KB
0002-Client-initiated-CopyDone-during-transaction-decodin.patch application/octet-stream 11.1 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Dave Cramer <davecramer(at)gmail(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2019-01-14 04:18:53
Message-ID: CAMsr+YEdUwc2w9ksJ3ezyFKbtDSPqdoUxP4b3BSSTH+dfY=EnA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 3 Dec 2018 at 19:38, Dave Cramer <davecramer(at)gmail(dot)com> wrote:

> Dmitry,
>
> Please see attached rebased patches
>

I'm fine with patch 0001, though I find this comment a bit hard to follow:

+ * The send_data callback must enqueue complete CopyData messages to libpq
+ * using pq_putmessage_noblock or similar, since the walsender loop may
send
+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.

I think it needs splitting up into a couple of sentences.

In patch 0002, stopping during a txn. It's important that once we skip any
action, we continue skipping. In patch 0002 I'd like it to be clearer that
we will *always* skip the rb->commit callback if rb->continue_decoding_cb()
returned false and aborted the while loop. I suggest storing the result of
(rb->continue_decoding_cb == NULL || rb->continue_decoding_cb()) in an
assignment within the while loop, and testing that result later.

e.g.

(continue_decoding = (rb->continue_decoding_cb == NULL ||
rb->continue_decoding_cb()))

and later

if (continue_decoding) {
rb->commit(rb, txn, commit_lsn);
}

I don't actually think it's necessary to re-test the continue_decoding
callback and skip commit here. If we've sent all the of the txn
except the commit, we might as well send the commit, it's tiny and might
save some hassle later.

I think a comment on the skipped commit would be good too:

/*
* If we skipped any changes due to a client CopyDone we must not send a
commit
* otherwise the client would incorrectly think it received a complete
transaction.
*/

I notice that the fast-path logic in WalSndWriteData is removed by this
patch. It isn't moved; there's no comparison of last_reply_timestamp
and wal_sender_timeout now. What's the rationale there? You want to ensure
that we reach ProcessRepliesIfAny() ? Can we cheaply test for a readable
client socket then still take the fast-path if it's not readable?

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise


From: Dave Cramer <davecramer(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2019-01-15 12:53:57
Message-ID: CADK3HHJ725x_4PG+xyCUr2DguXy0188DU-+6gJ8A1AbrCAaB=w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Dave Cramer

On Sun, 13 Jan 2019 at 23:19, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On Mon, 3 Dec 2018 at 19:38, Dave Cramer <davecramer(at)gmail(dot)com> wrote:
>
>> Dmitry,
>>
>> Please see attached rebased patches
>>
>
> I'm fine with patch 0001, though I find this comment a bit hard to follow:
>
> + * The send_data callback must enqueue complete CopyData messages to libpq
> + * using pq_putmessage_noblock or similar, since the walsender loop may
> send
> + * CopyDone then exit and return to command mode in response to a client
> + * CopyDone between calls to send_data.
>
>
I think it needs splitting up into a couple of sentences.
>
> Fair point, remember it was originally written by a non-english speaker

>
> In patch 0002, stopping during a txn. It's important that once we skip any
> action, we continue skipping. In patch 0002 I'd like it to be clearer that
> we will *always* skip the rb->commit callback if rb->continue_decoding_cb()
> returned false and aborted the while loop. I suggest storing the result of
> (rb->continue_decoding_cb == NULL || rb->continue_decoding_cb()) in an
> assignment within the while loop, and testing that result later.
>
> e.g.
>
> (continue_decoding = (rb->continue_decoding_cb == NULL ||
> rb->continue_decoding_cb()))
>
> and later
>
> if (continue_decoding) {
> rb->commit(rb, txn, commit_lsn);
> }
>
> Will do

> I don't actually think it's necessary to re-test the continue_decoding
> callback and skip commit here. If we've sent all the of the txn
> except the commit, we might as well send the commit, it's tiny and might
> save some hassle later.
>
>
> I think a comment on the skipped commit would be good too:
>
> /*
> * If we skipped any changes due to a client CopyDone we must not send a
> commit
> * otherwise the client would incorrectly think it received a complete
> transaction.
> */
>
> I notice that the fast-path logic in WalSndWriteData is removed by this
> patch. It isn't moved; there's no comparison of last_reply_timestamp
> and wal_sender_timeout now. What's the rationale there? You want to ensure
> that we reach ProcessRepliesIfAny() ? Can we cheaply test for a readable
> client socket then still take the fast-path if it's not readable?
>

This may have been a mistake as I am taking this over from a very old patch
that I did not originally write. I'll have a look at this

I

>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> 2ndQuadrant - PostgreSQL Solutions for the Enterprise
>


From: Dave Cramer <davecramer(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2019-01-20 16:40:19
Message-ID: CADK3HHLVvu_BMWNUEeFYWZ1G_4ia3pOGjG4qw5kZZa3wqOJmhA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Dave Cramer

On Tue, 15 Jan 2019 at 07:53, Dave Cramer <davecramer(at)gmail(dot)com> wrote:

>
> Dave Cramer
>
>
> On Sun, 13 Jan 2019 at 23:19, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
>> On Mon, 3 Dec 2018 at 19:38, Dave Cramer <davecramer(at)gmail(dot)com> wrote:
>>
>>> Dmitry,
>>>
>>> Please see attached rebased patches
>>>
>>
>> I'm fine with patch 0001, though I find this comment a bit hard to follow:
>>
>> + * The send_data callback must enqueue complete CopyData messages to
>> libpq
>> + * using pq_putmessage_noblock or similar, since the walsender loop may
>> send
>> + * CopyDone then exit and return to command mode in response to a client
>> + * CopyDone between calls to send_data.
>>
>>
> I think it needs splitting up into a couple of sentences.
>>
>> Fair point, remember it was originally written by a non-english speaker
>

Thoughts on below ?

+/*
+ * Main loop of walsender process that streams the WAL over Copy messages.
+ *
+ * The send_data callback must enqueue complete CopyData messages to the
client
+ * using pq_putmessage_noblock or similar
+ * In order to preserve the protocol it is necessary to send all of the
existing
+ * messages still in the buffer as the WalSender loop may send
+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.
+ */

>
>> In patch 0002, stopping during a txn. It's important that once we skip
>> any action, we continue skipping. In patch 0002 I'd like it to be clearer
>> that we will *always* skip the rb->commit callback
>> if rb->continue_decoding_cb() returned false and aborted the while loop. I
>> suggest storing the result of (rb->continue_decoding_cb == NULL ||
>> rb->continue_decoding_cb()) in an assignment within the while loop, and
>> testing that result later.
>>
>> e.g.
>>
>> (continue_decoding = (rb->continue_decoding_cb == NULL ||
>> rb->continue_decoding_cb()))
>>
>> and later
>>
>> if (continue_decoding) {
>> rb->commit(rb, txn, commit_lsn);
>> }
>>
>> Will do
>

Hmmm... I don't actually see how this is any different than what we have in
the patch now where below would the test occur?

> I don't actually think it's necessary to re-test the continue_decoding
>> callback and skip commit here. If we've sent all the of the txn
>> except the commit, we might as well send the commit, it's tiny and might
>> save some hassle later.
>>
>>
>> I think a comment on the skipped commit would be good too:
>>
>> /*
>> * If we skipped any changes due to a client CopyDone we must not send a
>> commit
>> * otherwise the client would incorrectly think it received a complete
>> transaction.
>> */
>>
>> I notice that the fast-path logic in WalSndWriteData is removed by this
>> patch. It isn't moved; there's no comparison of last_reply_timestamp
>> and wal_sender_timeout now. What's the rationale there? You want to ensure
>> that we reach ProcessRepliesIfAny() ? Can we cheaply test for a readable
>> client socket then still take the fast-path if it's not readable?
>>
>
> This may have been a mistake as I am taking this over from a very old
> patch that I did not originally write. I'll have a look at this
>

OK, I'm trying to decipher the original intent of this patch as well as I
didn't write it.

There are some hints here
https://www.postgresql.org/message-id/CAFgjRd1LgVbtH%3D9O9_xvKQjvUP7aRF-edxqwKfaNs9hMFW_4gw%40mail.gmail.com

As to why the fast path logic was removed. Does it make sense to you?

Dave

>
>>


From: Andres Freund <andres(at)anarazel(dot)de>
To: Dave Cramer <davecramer(at)gmail(dot)com>
Cc: 9erthalion6(at)gmail(dot)com, Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2019-02-16 03:01:50
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2018-12-03 06:38:43 -0500, Dave Cramer wrote:
> From 4d023cfc1fed0b5852b4da1aad6a32549b03ce26 Mon Sep 17 00:00:00 2001
> From: Dave Cramer <davecramer(at)gmail(dot)com>
> Date: Fri, 30 Nov 2018 18:23:49 -0500
> Subject: [PATCH 1/5] Respect client initiated CopyDone in walsender
>
> ---
> src/backend/replication/walsender.c | 36 ++++++++++++++++++++++++++++++------
> 1 file changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
> index 46edb52..93f2648 100644
> --- a/src/backend/replication/walsender.c
> +++ b/src/backend/replication/walsender.c
> @@ -770,6 +770,14 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
> sendTimeLineValidUpto = state->currTLIValidUntil;
> sendTimeLineNextTLI = state->nextTLI;
>
> + /*
> + * If the client sent CopyDone while we were waiting,
> + * bail out so we can wind up the decoding session.
> + */
> + if (streamingDoneSending)
> + return -1;
> +
> + /* more than one block available */
> /* make sure we have enough WAL available */
> flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
>
> @@ -1341,8 +1349,12 @@ WalSndWaitForWal(XLogRecPtr loc)
> * It's important to do this check after the recomputation of
> * RecentFlushPtr, so we can send all remaining data before shutting
> * down.
> - */
> - if (got_STOPPING)
> + *
> + * We'll also exit here if the client sent CopyDone because it wants
> + * to return to command mode.
> + */
> +
> + if (got_STOPPING || streamingDoneReceiving)
> break;
>
> /*
> @@ -2095,7 +2107,14 @@ WalSndCheckTimeOut(void)
> }
> }
>
> -/* Main loop of walsender process that streams the WAL over Copy messages. */
> +/*
> + * Main loop of walsender process that streams the WAL over Copy messages.
> + *
> + * The send_data callback must enqueue complete CopyData messages to libpq
> + * using pq_putmessage_noblock or similar, since the walsender loop may send
> + * CopyDone then exit and return to command mode in response to a client
> + * CopyDone between calls to send_data.
> + */

Wait, how is it ok to end CopyDone before all the pending data has been
sent out?

> diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
> index 23466ba..66b6e90 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -1497,7 +1497,9 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
> rb->begin(rb, txn);
>
> iterstate = ReorderBufferIterTXNInit(rb, txn);
> - while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL)
> + while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL &&
> + (rb->continue_decoding_cb == NULL ||
> + rb->continue_decoding_cb()))
> {
> Relation relation = NULL;
> Oid reloid;

> @@ -1774,8 +1776,11 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
> ReorderBufferIterTXNFinish(rb, iterstate);
> iterstate = NULL;
>
> - /* call commit callback */
> - rb->commit(rb, txn, commit_lsn);
> + if (rb->continue_decoding_cb == NULL || rb->continue_decoding_cb())
> + {
> + /* call commit callback */
> + rb->commit(rb, txn, commit_lsn);
> + }

I'm doubtful it's ok to simply stop in the middle of a transaction.

> @@ -1194,17 +1212,6 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
>
> CHECK_FOR_INTERRUPTS();
>
> - /* Try to flush pending output to the client */
> - if (pq_flush_if_writable() != 0)
> - WalSndShutdown();
> -
> - /* Try taking fast path unless we get too close to walsender timeout. */
> - if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> - wal_sender_timeout / 2) &&
> - !pq_is_send_pending())
> - {
> - return;
> - }

As somebody else commented on the thread, I'm also doubtful this is
ok. This'll introduce significant additional blocking unless I'm missing
something?

> /* If we have pending write here, go to slow path */
> for (;;)
> @@ -1358,7 +1365,14 @@ WalSndWaitForWal(XLogRecPtr loc)
> break;
>
> /*
> - * We only send regular messages to the client for full decoded
> + * If we have received CopyDone from the client, sent CopyDone
> + * ourselves, it's time to exit streaming.
> + */
> + if (!IsStreamingActive()) {
> + break;
> + }

Wrong formatting.

I wonder if the saner approach here isn't to support query cancellations
or something of that vein, and then handle the error.

Greetings,

Andres Freund


From: Dave Cramer <davecramer(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2019-02-16 20:38:03
Message-ID: CADK3HH+WcV+YbpQf7ZAC8BsySw90vwS=8cGx-GdBF_=GOeOFcg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andres,

Thanks for looking at this. FYI, I did not originally write this, rather
the original author has not replied to requests.
JDBC could use this, I assume others could as well.

That said I'm certainly open to suggestions on how to do this.

Craig, do you have any other ideas?

Dave Cramer

On Fri, 15 Feb 2019 at 22:01, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2018-12-03 06:38:43 -0500, Dave Cramer wrote:
> > From 4d023cfc1fed0b5852b4da1aad6a32549b03ce26 Mon Sep 17 00:00:00 2001
> > From: Dave Cramer <davecramer(at)gmail(dot)com>
> > Date: Fri, 30 Nov 2018 18:23:49 -0500
> > Subject: [PATCH 1/5] Respect client initiated CopyDone in walsender
> >
> > ---
> > src/backend/replication/walsender.c | 36
> ++++++++++++++++++++++++++++++------
> > 1 file changed, 30 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/backend/replication/walsender.c
> b/src/backend/replication/walsender.c
> > index 46edb52..93f2648 100644
> > --- a/src/backend/replication/walsender.c
> > +++ b/src/backend/replication/walsender.c
> > @@ -770,6 +770,14 @@ logical_read_xlog_page(XLogReaderState *state,
> XLogRecPtr targetPagePtr, int req
> > sendTimeLineValidUpto = state->currTLIValidUntil;
> > sendTimeLineNextTLI = state->nextTLI;
> >
> > + /*
> > + * If the client sent CopyDone while we were waiting,
> > + * bail out so we can wind up the decoding session.
> > + */
> > + if (streamingDoneSending)
> > + return -1;
> > +
> > + /* more than one block available */
> > /* make sure we have enough WAL available */
> > flushptr = WalSndWaitForWal(targetPagePtr + reqLen);
> >
> > @@ -1341,8 +1349,12 @@ WalSndWaitForWal(XLogRecPtr loc)
> > * It's important to do this check after the recomputation
> of
> > * RecentFlushPtr, so we can send all remaining data
> before shutting
> > * down.
> > - */
> > - if (got_STOPPING)
> > + *
> > + * We'll also exit here if the client sent CopyDone
> because it wants
> > + * to return to command mode.
> > + */
> > +
> > + if (got_STOPPING || streamingDoneReceiving)
> > break;
> >
> > /*
> > @@ -2095,7 +2107,14 @@ WalSndCheckTimeOut(void)
> > }
> > }
> >
> > -/* Main loop of walsender process that streams the WAL over Copy
> messages. */
> > +/*
> > + * Main loop of walsender process that streams the WAL over Copy
> messages.
> > + *
> > + * The send_data callback must enqueue complete CopyData messages to
> libpq
> > + * using pq_putmessage_noblock or similar, since the walsender loop may
> send
> > + * CopyDone then exit and return to command mode in response to a client
> > + * CopyDone between calls to send_data.
> > + */
>
> Wait, how is it ok to end CopyDone before all the pending data has been
> sent out?
>
>
>
> > diff --git a/src/backend/replication/logical/reorderbuffer.c
> b/src/backend/replication/logical/reorderbuffer.c
> > index 23466ba..66b6e90 100644
> > --- a/src/backend/replication/logical/reorderbuffer.c
> > +++ b/src/backend/replication/logical/reorderbuffer.c
> > @@ -1497,7 +1497,9 @@ ReorderBufferCommit(ReorderBuffer *rb,
> TransactionId xid,
> > rb->begin(rb, txn);
> >
> > iterstate = ReorderBufferIterTXNInit(rb, txn);
> > - while ((change = ReorderBufferIterTXNNext(rb, iterstate))
> != NULL)
> > + while ((change = ReorderBufferIterTXNNext(rb, iterstate))
> != NULL &&
> > + (rb->continue_decoding_cb == NULL ||
> > + rb->continue_decoding_cb()))
> > {
> > Relation relation = NULL;
> > Oid reloid;
>
> > @@ -1774,8 +1776,11 @@ ReorderBufferCommit(ReorderBuffer *rb,
> TransactionId xid,
> > ReorderBufferIterTXNFinish(rb, iterstate);
> > iterstate = NULL;
> >
> > - /* call commit callback */
> > - rb->commit(rb, txn, commit_lsn);
> > + if (rb->continue_decoding_cb == NULL ||
> rb->continue_decoding_cb())
> > + {
> > + /* call commit callback */
> > + rb->commit(rb, txn, commit_lsn);
> > + }
>
>
> I'm doubtful it's ok to simply stop in the middle of a transaction.
>
>
>
> > @@ -1194,17 +1212,6 @@ WalSndWriteData(LogicalDecodingContext *ctx,
> XLogRecPtr lsn, TransactionId xid,
> >
> > CHECK_FOR_INTERRUPTS();
> >
> > - /* Try to flush pending output to the client */
> > - if (pq_flush_if_writable() != 0)
> > - WalSndShutdown();
> > -
> > - /* Try taking fast path unless we get too close to walsender
> timeout. */
> > - if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> > -
> wal_sender_timeout / 2) &&
> > - !pq_is_send_pending())
> > - {
> > - return;
> > - }
>
> As somebody else commented on the thread, I'm also doubtful this is
> ok. This'll introduce significant additional blocking unless I'm missing
> something?
>
>
>
> > /* If we have pending write here, go to slow path */
> > for (;;)
> > @@ -1358,7 +1365,14 @@ WalSndWaitForWal(XLogRecPtr loc)
> > break;
> >
> > /*
> > - * We only send regular messages to the client for full
> decoded
> > + * If we have received CopyDone from the client, sent
> CopyDone
> > + * ourselves, it's time to exit streaming.
> > + */
> > + if (!IsStreamingActive()) {
> > + break;
> > + }
>
> Wrong formatting.
>
>
> I wonder if the saner approach here isn't to support query cancellations
> or something of that vein, and then handle the error.
>
> Greetings,
>
> Andres Freund
>


From: David Steele <david(at)pgmasters(dot)net>
To: Dave Cramer <davecramer(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2019-03-07 07:19:26
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 2/16/19 10:38 PM, Dave Cramer wrote:
>
> Thanks for looking at this. FYI, I did not originally write this, rather
> the original author has not replied to requests.
> JDBC could use this, I assume others could as well.
>
> That said I'm certainly open to suggestions on how to do this.
>
> Craig, do you have any other ideas?
This patch appears to be stalled. Are there any ideas on how to proceed?

Regards,
--
-David
david(at)pgmasters(dot)net


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Dave Cramer <davecramer(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2019-07-08 10:39:45
Message-ID: CA+hUKGJCUHP0WGu4AzC8L5Wo612-H5qq7LuqwLHyg4dEj0T3uw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 7, 2019 at 8:19 PM David Steele <david(at)pgmasters(dot)net> wrote:
> On 2/16/19 10:38 PM, Dave Cramer wrote:
> > Thanks for looking at this. FYI, I did not originally write this, rather
> > the original author has not replied to requests.
> > JDBC could use this, I assume others could as well.
> >
> > That said I'm certainly open to suggestions on how to do this.
> >
> > Craig, do you have any other ideas?

> This patch appears to be stalled. Are there any ideas on how to proceed?

Hi Dave (C),

It sounds like this should be withdrawn for now, and potentially
revived in some future CF if someone has time to work on it?

--
Thomas Munro
https://enterprisedb.com


From: Dave Cramer <davecramer(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Vladimir Gordiychuk <folyga(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk
Date: 2019-07-08 10:41:38
Message-ID: CADK3HHLHRVLeTKw4xJfJ=DtgG=N0+p+j+KAAndbX9F0XKKPfZA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 8 Jul 2019 at 06:40, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> On Thu, Mar 7, 2019 at 8:19 PM David Steele <david(at)pgmasters(dot)net> wrote:
> > On 2/16/19 10:38 PM, Dave Cramer wrote:
> > > Thanks for looking at this. FYI, I did not originally write this,
> rather
> > > the original author has not replied to requests.
> > > JDBC could use this, I assume others could as well.
> > >
> > > That said I'm certainly open to suggestions on how to do this.
> > >
> > > Craig, do you have any other ideas?
>
> > This patch appears to be stalled. Are there any ideas on how to proceed?
>
> Hi Dave (C),
>
> It sounds like this should be withdrawn for now, and potentially
> revived in some future CF if someone has time to work on it?
>
>
Hi Thomas,

I'm fine with that decision.

Thanks,

Dave Cramer

>
>
>