Fix more race conditions in the newly-added pg_rewind test.
authorHeikki Linnakangas <[email protected]>
Mon, 7 Dec 2020 12:44:34 +0000 (14:44 +0200)
committerHeikki Linnakangas <[email protected]>
Mon, 7 Dec 2020 12:55:28 +0000 (14:55 +0200)
commitbeb6b45ab7470b0412edb565eaa6f48683245d47
tree32f617a508ae3e0e59e1b1936df209ac3d974ee4
parent1dd608bbac28a5dfcaade0ffb56f0dc4f61f7320
Fix more race conditions in the newly-added pg_rewind test.

pg_rewind looks at the control file to check what timeline a server is on.
But promotion doesn't immediately write a checkpoint, it merely writes
an end-of-recovery WAL record. If pg_rewind runs immediately after
promotion, before the checkpoint has completed, it will think think that
the server is still on the earlier timeline. We ran into this issue a long
time ago already, see commit 484a848a73f.

It's a bit bogus that pg_rewind doesn't determine the timeline correctly
until the end-of-recovery checkpoint has completed. We probably should
fix that. But for now work around it by waiting for the checkpoint
to complete before running pg_rewind, like we did in commit 484a848a73f.

In the passing, tidy up the new test a little bit. Rerder the INSERTs so
that the comments make more sense, remove a spurious CHECKPOINT call after
pg_rewind has already run, and add --debug option, so that if this fails
again, we'll have more data.

Per buildfarm failure at https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=rorqual&dt=2020-12-06%2018%3A32%3A19&stg=pg_rewind-check.
Backpatch to all supported versions.

Discussion: https://www.postgresql.org/message-id/1713707e-e318-761c-d287-5b6a4aa807e8@iki.fi
src/bin/pg_rewind/t/008_min_recovery_point.pl