git.postgresql.org Git - postgresql.git/log

Fix subtransaction test for Python 3.10

Starting with Python 3.10, the stacktrace looks differently:
  -  PL/Python function "subtransaction_exit_subtransaction_in_with", line 3, in <module>
  -    s.__exit__(None, None, None)
  +  PL/Python function "subtransaction_exit_subtransaction_in_with", line 2, in <module>
  +    with plpy.subtransaction() as s:
Using try/except specifically makes the error look always the same.

(See https://github.com/python/cpython/pull/25719 for the discussion
of this change in Python.)

Author: Honza Horak <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/853083.1620749597%40sss.pgh.pa.us
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1959080

Document a few caveats in synchronous logical replication.

In a synchronous logical setup, locking [user] catalog tables can cause
deadlock. This is because logical decoding of transactions can lock
catalog tables to access them so exclusively locking those in transactions
can lead to deadlock. To avoid this users must refrain from having
exclusive locks on catalog tables.

Author: Takamichi Osumi
Reviewed-by: Vignesh C, Amit Kapila
Backpatch-through: 9.6
Discussion: https://www.postgresql.org/message-id/20210222222847.tpnb6eg3yiykzpky%40alap3.anarazel.de

Detect unused steps in isolation specs and do some cleanup

This is useful for developers to find out if an isolation spec is
over-engineered or if it needs more work by warning at the end of a
test run if a step is not used, generating a failure with extra diffs.

While on it, clean up all the specs which include steps not used in any
permutations to simplify them.

This is a backpatch of 989d23b and 06fdc4e, as it is becoming useful to
make all the branches consistent for an upcoming patch that will improve
the output generated by isolationtester.

Author: Michael Paquier
Reviewed-by: Asim Praveen, Melanie Plageman
Discussion: https://postgr.es/m/20190819080820 [email protected]
Discussion: https://postgr.es/m/794820.1623872009@sss.pgh.pa.us
Backpatch-through: 9.6

Remove dry-run mode from isolationtester

The original purpose of the dry-run mode is to be able to print all the
possible permutations from a spec file, but it has become less useful
since isolation tests have improved regarding deadlock detection as one
step not wanted by the author could block indefinitely now (originally
the step blocked would have been detected rather quickly). Per
discussion, let's remove it.

This is a backpatch of 9903338 for 9.6~12. It is proving to become
useful to have on those branches so as the code gets consistent across
all supported versions, as a matter of improving the output generated by
isolationtester.

Author: Michael Paquier
Reviewed-by: Asim Praveen, Melanie Plageman
Discussion: https://postgr.es/m/20190819080820 [email protected]
Discussion: https://postgr.es/m/794820.1623872009@sss.pgh.pa.us
Backpatch-through: 9.6

Fix plancache refcount leak after error in ExecuteQuery.

When stuffing a plan from the plancache into a Portal, one is
not supposed to risk throwing an error between GetCachedPlan and
PortalDefineQuery; if that happens, the plan refcount incremented
by GetCachedPlan will be leaked.  I managed to break this rule
while refactoring code in 9dbf2b7d7.  There is no visible
consequence other than some memory leakage, and since nobody is
very likely to trigger the relevant error conditions many times
in a row, it's not surprising we haven't noticed.  Nonetheless,
it's a bug, so rearrange the order of operations to remove the
hazard.

Noted on the way to looking for a better fix for bug #17053.
This mistake is pretty old, so back-patch to all supported
branches.

Further refinement of stuck_on_old_timeline recovery test

TestLib::perl2host can take a file argument as well as a directory
argument, so that code becomes substantially simpler. Also add comments
on why we're using forward slashes, and why we're setting
PERL_BADLANG=0.

Discussion: https://postgr.es/m/e9947bcd-20ee-027c-f0fe-01f736b7e345@dunslane.net

Fix decoding of speculative aborts.

During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.

The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.

Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com

Work around portability issue with newer versions of mktime().

Recent glibc versions have made mktime() fail if tm_isdst is
inconsistent with the prevailing timezone; in particular it fails for
tm_isdst = 1 when the zone is UTC.  (This seems wildly inconsistent
with the POSIX-mandated treatment of "incorrect" values for the other
fields of struct tm, so if you ask me it's a bug, but I bet they'll
say it's intentional.)  This has been observed to cause cosmetic
problems when pg_restore'ing an archive created in a different
timezone.

To fix, do mktime() using the field values from the archive, and if
that fails try again with tm_isdst = -1.  This will give a result
that's off by the UTC-offset difference from the original zone, but
that was true before, too.  It's not terribly critical since we don't
do anything with the result except possibly print it.  (Someday we
should flush this entire bit of logic and record a standard-format
timestamp in the archive instead.  That's not okay for a back-patched
bug fix, though.)

Also, guard our only other use of mktime() by having initdb's
build_time_t() set tm_isdst = -1 not 0.  This case could only have
an issue in zones that are DST year-round; but I think some do exist,
or could in future.

Per report from Wells Oliver.  Back-patch to all supported
versions, since any of them might need to run with a newer glibc.

Discussion: https://postgr.es/m/CAOC+FBWDhDHO7G-i1_n_hjRzCnUeFO+H-Czi1y10mFhRWpBrew@mail.gmail.com

Further tweaks to stuck_on_old_timeline recovery test

Translate path slashes on target directory path. This was confusing old
branches, but is applied to all branches for the sake of uniformity.
Perl is perfectly able to understand paths with forward slashes.

Along the way, restore the previous archive_wait query, for the sake of
uniformity with other tests, per gripe from Tom Lane.

Ignore more environment variables in pg_regress.c

This is similar to the work done in 8279f68 for TestLib.pm, where
environment variables set may cause unwanted failures if using a
temporary installation with pg_regress. The list of variables reset is
adjusted in each stable branch depending on what is supported.

Comments are added to remember that the lists in TestLib.pm and
pg_regress.c had better be kept in sync.

Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 9.6

Restore robustness of TAP tests that wait for postmaster restart.

Several TAP tests use poll_query_until() to wait for the postmaster
to restart.  They were checking to see if a trivial query
(e.g. "SELECT 1") succeeds.  However, that's problematic in the wake
of commit 11e9caff8, because now that we feed said query to psql
via stdin, we risk IPC::Run whining about a SIGPIPE failure if psql
quits before reading the query.  Hence, we can't use a nonempty
query in cases where we need to wait for connection failures to
stop happening.

Per the precedent of commits c757a3da0 and 6d41dd045, we can pass
"undef" as the query in such cases to ensure that IPC::Run has
nothing to write.  However, then we have to say that the expected
output is empty, and this exposes a deficiency in poll_query_until:
if psql fails altogether and returns empty stdout, poll_query_until
will treat that as a success!  That's because, contrary to its
documentation, it makes no actual check for psql failure, looking
neither at the exit status nor at stderr.

To fix that, adjust poll_query_until to insist on empty stderr as
well as a stdout match.  (I experimented with checking exit status
instead, but it seems that psql often does exit(1) in cases that we
need to consider successes.  That might be something to fix someday,
but it would be a non-back-patchable behavior change.)

Back-patch to v10.  The test cases needing this exist only as far
back as v11, but it seems wise to keep poll_query_until's behavior
the same in v10, in case we back-patch another such test case in
future.  (9.6 does not currently need this change, because in that
branch poll_query_until can't be told to accept empty stdout as
a success case.)

Per assorted buildfarm failures, mostly on hoverfly.

Discussion: https://postgr.es/m/CAA4eK1+zM6L4QSA1XMvXY_qqWwdUmqkOS1+hWvL8QcYEBGA1Uw@mail.gmail.com

Ensure pg_filenode_relation(0, 0) returns NULL.

Previously, a zero value for the relfilenode resulted in
a confusing error message about "unexpected duplicate".
This function returns NULL for other invalid relfilenode
values, so zero should be treated likewise.

It's been like this all along, so back-patch to all supported
branches.

Justin Pryzby

Discussion: https://postgr.es/m/20210612023324 [email protected]

Don't use Asserts to check for violations of replication protocol.

Using an Assert to check the validity of incoming messages is an
extremely poor decision.  In a debug build, it should not be that easy
for a broken or malicious remote client to crash the logrep worker.
The consequences could be even worse in non-debug builds, which will
fail to make such checks at all, leading to who-knows-what misbehavior.
Hence, promote every Assert that could possibly be triggered by wrong
or out-of-order replication messages to a full test-and-ereport.

To avoid bloating the set of messages the translation team has to cope
with, establish a policy that replication protocol violation error
reports don't need to be translated.  Hence, all the new messages here
use errmsg_internal().  A couple of old messages are changed likewise
for consistency.

Along the way, fix some non-idiomatic or outright wrong uses of
hash_search().

Most of these mistakes are new with the "streaming replication"
patch (commit 464824323), but a couple go back a long way.
Back-patch as appropriate.

Discussion: https://postgr.es/m/1719083.1623351052@sss.pgh.pa.us

Fix new recovery test for use under msys

Commit caba8f0d43 wasn't quite right for msys, as demonstrated by
several buildfarm animals, including jacana and fairywren. We need to
use the msys perl in the archive command, but call it in such a way that
Windows will understand the path. Furthermore, inside the copy script we
need to convert a Windows path to an msys path.

Remove PGSSLCRLDIR from the list of variables ignored in TAP tests

This variable was present in the list added by 9d660670, but it is not
supported by this branch. Issue noticed while diving into a similar
change for pg_regress.c.

Backpatch-through: 9.6

Adjust new test case to set wal_keep_segments.

Per buildfarm member conchuela and Kyotaro Horiguchi, it's possible
for the WAL segment that the cascading standby needs to be removed
too quickly. Hopefully this will prevent that.

Kyotaro Horiguchi

Discussion: http://postgr.es/m/20210610.101240.1270925505780628275 [email protected]

Fix corner case failure of new standby to follow new primary.

This only happens if (1) the new standby has no WAL available locally,
(2) the new standby is starting from the old timeline, (3) the promotion
happened in the WAL segment from which the new standby is starting,
(4) the timeline history file for the new timeline is available from
the archive but the WAL files for are not (i.e. this is a race),
(5) the WAL files for the new timeline are available via streaming,
and (6) recovery_target_timeline='latest'.

Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this
logic and was an improvement over the previous code, but it mishandled
this case. If recovery_target_timeline='latest' and restore_command is
set, validateRecoveryParameters() can change recoveryTargetTLI to be
different from receiveTLI. If streaming is then tried afterward,
expectedTLEs gets initialized with the history of the wrong timeline.
It's supposed to be a list of entries explaining how to get to the
target timeline, but in this case it ends up with a list of entries
explaining how to get to the new standby's original timeline, which
isn't right.

Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi.

Discussion: http://postgr.es/m/CAFiTN-sE-jr=LB8jQuxeqikd-Ux+jHiXyh4YDiZMPedgQKup0g@mail.gmail.com

Allow PostgresNode.pm's backup method to accept backup_options.

Partial back-port of commit 081876d75ea15c3bd2ee5ba64a794fd8ea46d794.
A test case for a pending bug fix needs this capability, but the code
on 9.6 is significantly different, so I'm only back-patching this
change as far as v10. We'll have to work around the problem another
way in v9.6.

Discussion: http://postgr.es/m/CAFiTN-tcivNvL0Rg6rD7_CErNfE75H7+gh9WbMxjbgsattja1Q@mail.gmail.com

Fix inconsistencies in psql --help=commands

The set of subcommands supported by \dAp, \do and \dy was described
incorrectly in psql's --help. The documentation was already consistent
with the code.

Reported-by: inoas, from IRC
Author: Matthijs van der Vleuten
Reviewed-by: Neil Chen
Discussion: https://postgr.es/m/6a984e24-2171-4039-9050-92d55e7b23fe@www.fastmail.com
Backpatch-through: 9.6

Fix incautious handling of possibly-miscoded strings in client code.

An incorrectly-encoded multibyte character near the end of a string
could cause various processing loops to run past the string's
terminating NUL, with results ranging from no detectable issue to
a program crash, depending on what happens to be in the following
memory.

This isn't an issue in the server, because we take care to verify
the encoding of strings before doing any interesting processing
on them.  However, that lack of care leaked into client-side code
which shouldn't assume that anyone has validated the encoding of
its input.

Although this is certainly a bug worth fixing, the PG security team
elected not to regard it as a security issue, primarily because
any untrusted text should be sanitized by PQescapeLiteral or
the like before being incorporated into a SQL or psql command.
(If an app fails to do so, the same technique can be used to
cause SQL injection, with probably much more dire consequences
than a mere client-program crash.)  Those functions were already
made proof against this class of problem, cf CVE-2006-2313.

To fix, invent PQmblenBounded() which is like PQmblen() except it
won't return more than the number of bytes remaining in the string.
In HEAD we can make this a new libpq function, as PQmblen() is.
It seems imprudent to change libpq's API in stable branches though,
so in the back branches define PQmblenBounded as a macro in the files
that need it.  (Note that just changing PQmblen's behavior would not
be a good idea; notably, it would completely break the escaping
functions' defense against this exact problem.  So we just want a
version for those callers that don't have any better way of handling
this issue.)

Per private report from houjingyi.  Back-patch to all supported branches.

Support use of strnlen() in pre-v11 branches.

Back-patch a minimal subset of commits fffd651e8 and 46912d9b1,
to support strnlen() on all platforms without adding any callers.
This will be needed by a following bug fix.

Fix compiler warning

Introduced by 41306a511c01dd299115cf447858a00e34aebbf6, happens with gcc
4.7.2.

Forward-port of 1ec36a9eb4c2, which was applied to 9.6 only.

Author: Peter Eisentraut <[email protected]>
Reported-by: Anton Voloshin <[email protected]>
Discussion: https://postgr.es/m/be8bbcdf-35f8-a8a6-098f-65c2e9497151@postgrespro.ru

In PostgresNode.pm, don't pass SQL to psql on the command line

The Msys shell mangles certain patterns in its command line, so avoid
handing arbitrary SQL to psql on the command line and instead use
IPC::Run's redirection facility for stdin. This pattern is already
mostly whats used, but query_poll_until() was not doing the right thing.

Problem discovered on the buildfarm when a new TAP test failed on msys.

Reduce risks of conflicts in internal queries of REFRESH MATVIEW CONCURRENTLY

The internal SQL queries used by REFRESH MATERIALIZED VIEW CONCURRENTLY
include some aliases for its diff and temporary relations with
rather-generic names: diff, newdata, newdata2 and mv.  Depending on the
queries used for the materialized view, using CONCURRENTLY could lead to
some internal failures if the query and those internal aliases conflict.

Those names have been chosen in 841c29c8.  This commit switches instead
to a naming pattern which is less likely going to cause conflicts, based
on an idea from Thomas Munro, by appending _$ to those aliases.  This is
not perfect as those new names could still conflict, but at least it has
the advantage to keep the code readable and simple while reducing the
likelihood of conflicts to be close to zero.

Reported-by: Mathis Rudolf
Author: Bharath Rupireddy
Reviewed-by: Bernd Helmle, Thomas Munro, Michael Paquier
Discussion: https://postgr.es/m/109c267a-10d2-3c53-b60e-720fcf44d9e8@credativ.de
Backpatch-through: 9.6

Ignore more environment variables in TAP tests

Various environment variables were not getting reset in the TAP tests,
which would cause failures depending on the tests or the environment
variables involved.  For example, PGSSL{MAX,MIN}PROTOCOLVERSION could
cause failures in the SSL tests.  Even worse, a junk value of
PGCLIENTENCODING makes a server startup fail.  The list of variables
reset is adjusted in each stable branch depending on what is supported.

While on it, simplify a bit the code per a suggestion from Andrew
Dunstan, using a list of variables instead of doing single deletions.

Reviewed-by: Andrew Dunstan, Daniel Gustafsson
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 9.6

Reject SELECT ... GROUP BY GROUPING SETS (()) FOR UPDATE.

This case should be disallowed, just as FOR UPDATE with a plain
GROUP BY is disallowed; FOR UPDATE only makes sense when each row
of the query result can be identified with a single table row.
However, we missed teaching CheckSelectLocking() to check
groupingSets as well as groupClause, so that it would allow
degenerate grouping sets. That resulted in a bad plan and
a null-pointer dereference in the executor.

Looking around for other instances of the same bug, the only one
I found was in examine_simple_variable(). That'd just lead to
silly estimates, but it should be fixed too.

Per private report from Yaoguang Chen.
Back-patch to all supported branches.

Raise a timeout to 180s, in test 010_logical_decoding_timelines.pl.

Per buildfarm member hornet. Also, update Pod documentation showing the
lower value. Back-patch to v10, where the test first appeared.

fix syntax error

Report configured port in MSVC built pg_config

This is a long standing omission, discovered when trying to write code
that relied on it.

Backpatch to all live branches.

Fix MSVC scripts when building with GSSAPI/Kerberos

The deliverables of upstream Kerberos on Windows are installed with
paths that do not match our MSVC scripts. First, the include folder was
named "inc/" in our scripts, but the upstream MSIs use "include/".
Second, the build would fail with 64-bit environments as the libraries
are named differently.

This commit adjusts the MSVC scripts to be compatible with the latest
installations of upstream, and I have checked that the compilation was
able to work with the 32-bit and 64-bit installations.

Special thanks to Kondo Yuta for the help in investigating the situation
in hamerkop, which had an incorrect configuration for the GSS
compilation.

Reported-by: Brian Ye
Discussion: https://postgr.es/m/162128202219.27274.12616756784952017465@wrigleys.postgresql.org
Backpatch-through: 9.6

doc: Fix description of some GUCs in docs and postgresql.conf.sample

The following parameters have been imprecise, or incorrect, about their
description (PGC_POSTMASTER or PGC_SIGHUP):
- autovacuum_work_mem (docs, as of 9.6~)
- huge_page_size (docs, as of 14~)
- max_logical_replication_workers (docs, as of 10~)
- max_sync_workers_per_subscription (docs, as of 10~)
- min_dynamic_shared_memory (docs, as of 14~)
- recovery_init_sync_method (postgresql.conf.sample, as of 14~)
- remove_temp_files_after_crash (docs, as of 14~)
- restart_after_crash (docs, as of 9.6~)
- ssl_min_protocol_version (docs, as of 12~)
- ssl_max_protocol_version (docs, as of 12~)

This commit adjusts the description of all these parameters to be more
consistent with the practice used for the others.

Revewed-by: Justin Pryzby
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 9.6

Disallow SSL renegotiation

SSL renegotiation is already disabled as of 48d23c72, however this does
not prevent the server to comply with a client willing to use
renegotiation.  In the last couple of years, renegotiation had its set
of security issues and flaws (like the recent CVE-2021-3449), and it
could be possible to crash the backend with a client attempting
renegotiation.

This commit takes one extra step by disabling renegotiation in the
backend in the same way as SSL compression (f9264d15) or tickets
(97d3a0b0).  OpenSSL 1.1.0h has added an option named
SSL_OP_NO_RENEGOTIATION able to achieve that.  In older versions
there is an option called SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS that
was undocumented, and could be set within the SSL object created when
the TLS connection opens, but I have decided not to use it, as it feels
trickier to rely on, and it is not official.  Note that this option is
not usable in OpenSSL < 1.1.0h as the internal contents of the *SSL
object are hidden to applications.

SSL renegotiation concerns protocols up to TLSv1.2.

Per original report from Robert Haas, with a patch based on a suggestion
by Andres Freund.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 9.6

Clean up cpluspluscheck violation.

"typename" is a C++ keyword, so pg_upgrade.h fails to compile in C++.
Fortunately, there seems no likely reason for somebody to need to
do that. Nonetheless, it's project policy that all .h files should
pass cpluspluscheck, so rename the argument to fix that.

Oversight in 57c081de0; back-patch as that was. (The policy requiring
pg_upgrade.h to pass cpluspluscheck only goes back to v12, but it
seems best to keep this code looking the same in all branches.)

Fix typo and outdated information in README.barrier

README.barrier didn't seem to get the memo when atomics were added. Fix
that.

Author: Tatsuo Ishii, David Rowley
Discussion: https://postgr.es/m/20210516.211133.2159010194908437625.t-ishii%40sraoss.co.jp
Backpatch-through: 9.6, oldest supported release

Be more careful about barriers when releasing BackgroundWorkerSlots.

ForgetBackgroundWorker lacked any memory barrier at all, while
BackgroundWorkerStateChange had one but unaccountably did
additional manipulation of the slot after the barrier. AFAICS,
the rule must be that the barrier is immediately before setting
or clearing slot->in_use.

It looks like back in 9.6 when ForgetBackgroundWorker was first
written, there might have been some case for not needing a
barrier there, but I'm not very convinced of that --- the fact
that the load of bgw_notify_pid is in the caller doesn't seem
to guarantee no memory ordering problem. So patch 9.6 too.

It's likely that this doesn't fix any observable bug on Intel
hardware, but machines with weaker memory ordering rules could
have problems here.

Discussion: https://postgr.es/m/4046084.1620244003@sss.pgh.pa.us

Doc: correct erroneous entry in this week's minor release notes.

The patch to disallow a NULL specification in combination with
GENERATED ... AS IDENTITY applied to both ALWAYS and BY DEFAULT
variants of that clause, not only the former.

Noted by Shay Rojansky.

Discussion: https://postgr.es/m/CADT4RqAwD3A=RvGiQU9AiTK-6VeuXcycwPHmJPv_OBCJFYOEww@mail.gmail.com

Prevent infinite insertion loops in spgdoinsert().

Formerly we just relied on operator classes that assert longValuesOK
to eventually shorten the leaf value enough to fit on an index page.
That fails since the introduction of INCLUDE-column support (commit
09c1c6ab4), because the INCLUDE columns might alone take up more
than a page, meaning no amount of leaf-datum compaction will get
the job done.  At least with spgtextproc.c, that leads to an infinite
loop, since spgtextproc.c won't throw an error for not being able
to shorten the leaf datum anymore.

To fix without breaking cases that would otherwise work, add logic
to spgdoinsert() to verify that the leaf tuple size is decreasing
after each "choose" step.  Some opclasses might not decrease the
size on every single cycle, and in any case, alignment roundoff
of the tuple size could obscure small gains.  Therefore, allow
up to 10 cycles without additional savings before throwing an
error.  (Perhaps this number will need adjustment, but it seems
quite generous right now.)

As long as we've developed this logic, let's back-patch it.
The back branches don't have INCLUDE columns to worry about, but
this seems like a good defense against possible bugs in operator
classes.  We already know that an infinite loop here is pretty
unpleasant, so having a defense seems to outweigh the risk of
breaking things.  (Note that spgtextproc.c is actually the only
known opclass with longValuesOK support, so that this is all moot
for known non-core opclasses anyway.)

Per report from Dilip Kumar.

Discussion: https://postgr.es/m/CAFiTN-uxP_soPhVG840tRMQTBmtA_f_Y8N51G7DKYYqDh7XN-A@mail.gmail.com

Fix query-cancel handling in spgdoinsert().

Knowing that a buggy opclass could cause an infinite insertion loop,
spgdoinsert() intended to allow its loop to be interrupted by query
cancel.  However, that never actually worked, because in iterations
after the first, we'd be holding buffer lock(s) which would cause
InterruptHoldoffCount to be positive, preventing servicing of the
interrupt.

To fix, check if an interrupt is pending, and if so fall out of
the insertion loop and service the interrupt after we've released
the buffers.  If it was indeed a query cancel, that's the end of
the matter.  If it was a non-canceling interrupt reason, make use
of the existing provision to retry the whole insertion.  (This isn't
as wasteful as it might seem, since any upper-level index tuples we
already created should be usable in the next attempt.)

While there's no known instance of such a bug in existing release
branches, it still seems like a good idea to back-patch this to
all supported branches, since the behavior is fairly nasty if a
loop does happen --- not only is it uncancelable, but it will
quickly consume memory to the point of an OOM failure.  In any
case, this code is certainly not working as intended.

Per report from Dilip Kumar.

Discussion: https://postgr.es/m/CAFiTN-uxP_soPhVG840tRMQTBmtA_f_Y8N51G7DKYYqDh7XN-A@mail.gmail.com

Refactor CHECK_FOR_INTERRUPTS() to add flexibility.

Split up CHECK_FOR_INTERRUPTS() to provide an additional macro
INTERRUPTS_PENDING_CONDITION(), which just tests whether an
interrupt is pending without attempting to service it. This is
useful in situations where the caller knows that interrupts are
blocked, and would like to find out if it's worth the trouble
to unblock them.

Also add INTERRUPTS_CAN_BE_PROCESSED(), which indicates whether
CHECK_FOR_INTERRUPTS() can be relied on to clear the pending interrupt.

This commit doesn't actually add any uses of the new macros,
but a follow-on bug fix will do so. Back-patch to all supported
branches to provide infrastructure for that fix.

Alvaro Herrera and Tom Lane

Discussion: https://postgr.es/m/20210513155351 [email protected]

Rename the logical replication global "wrconn"

The worker.c global wrconn is only meant to be used by logical apply/
tablesync workers, but there are other variables with the same name. To
reduce future confusion rename the global from "wrconn" to
"LogRepWorkerWalRcvConn".

While this is just cosmetic, it seems better to backpatch it all the way
back to 10 where this code appeared, to avoid future backpatching
issues.

Author: Peter Smith <[email protected]>
Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com

Stamp 10.17.

Last-minute updates for release notes.

Security: CVE-2021-32027, CVE-2021-32028, CVE-2021-32029

Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists.

It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE
list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present.
If it happens, the ON CONFLICT UPDATE code path would end up storing
tuples that include the values of the extra resjunk columns.  That's
fairly harmless in the short run, but if new columns are added to
the table then the values would become accessible, possibly leading
to malfunctions if they don't match the datatypes of the new columns.

This had escaped notice through a confluence of missing sanity checks,
including

* There's no cross-check that a tuple presented to heap_insert or
heap_update matches the table rowtype.  While it's difficult to
check that fully at reasonable cost, we can easily add assertions
that there aren't too many columns.

* The output-column-assignment cases in execExprInterp.c lacked
any sanity checks on the output column numbers, which seems like
an oversight considering there are plenty of assertion checks on
input column numbers.  Add assertions there too.

* We failed to apply nodeModifyTable's ExecCheckPlanOutput() to
the ON CONFLICT UPDATE tlist.  That wouldn't have caught this
specific error, since that function is chartered to ignore resjunk
columns; but it sure seems like a bad omission now that we've seen
this bug.

In HEAD, the right way to fix this is to make the processing of
ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists
now do, that is don't add "SET x = x" entries, and use
ExecBuildUpdateProjection to evaluate the tlist and combine it with
old values of the not-set columns.  This adds a little complication
to ExecBuildUpdateProjection, but allows removal of a comparable
amount of now-dead code from the planner.

In the back branches, the most expedient solution seems to be to
(a) use an output slot for the ON CONFLICT UPDATE projection that
actually matches the target table, and then (b) invent a variant of
ExecBuildProjectionInfo that can be told to not store values resulting
from resjunk columns, so it doesn't try to store into nonexistent
columns of the output slot.  (We can't simply ignore the resjunk columns
altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.)
This works back to v10.  In 9.6, projections work much differently and
we can't cheaply give them such an option.  The 9.6 version of this
patch works by inserting a JunkFilter when it's necessary to get rid
of resjunk columns.

In addition, v11 and up have the reverse problem when trying to
perform ON CONFLICT UPDATE on a partitioned table.  Through a
further oversight, adjust_partition_tlist() discarded resjunk columns
when re-ordering the ON CONFLICT UPDATE tlist to match a partition.
This accidentally prevented the storing-bogus-tuples problem, but
at the cost that MULTIEXPR_SUBLINK cases didn't work, typically
crashing if more than one row has to be updated.  Fix by preserving
resjunk columns in that routine.  (I failed to resist the temptation
to add more assertions there too, and to do some minor code
beautification.)

Per report from Andres Freund.  Back-patch to all supported branches.

Security: CVE-2021-32028

Prevent integer overflows in array subscripting calculations.

While we were (mostly) careful about ensuring that the dimensions of
arrays aren't large enough to cause integer overflow, the lower bound
values were generally not checked.  This allows situations where
lower_bound + dimension overflows an integer.  It seems that that's
harmless so far as array reading is concerned, except that array
elements with subscripts notionally exceeding INT_MAX are inaccessible.
However, it confuses various array-assignment logic, resulting in a
potential for memory stomps.

Fix by adding checks that array lower bounds aren't large enough to
cause lower_bound + dimension to overflow.  (Note: this results in
disallowing cases where the last subscript position would be exactly
INT_MAX.  In principle we could probably allow that, but there's a lot
of code that computes lower_bound + dimension and would need adjustment.
It seems doubtful that it's worth the trouble/risk to allow it.)

Somewhat independently of that, array_set_element() was careless
about possible overflow when checking the subscript of a fixed-length
array, creating a different route to memory stomps.  Fix that too.

Security: CVE-2021-32027

Translation updates

Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: f89e869abf8730f74d0a75c59436fd02ab7840b1

Release notes for 13.3, 12.7, 11.12, 10.17, 9.6.22.

AlterSubscription_refresh: avoid stomping on global variable

This patch replaces use of the global "wrconn" variable in
AlterSubscription_refresh with a local variable of the same name, making
it consistent with other functions in subscriptioncmds.c (e.g.
DropSubscription).

The global wrconn is only meant to be used for logical apply/tablesync worker.
Abusing it this way is known to cause trouble if an apply worker
manages to do a subscription refresh, such as reported by Jeremy Finzel
and diagnosed by Andres Freund back in November 2020, at
https://www.postgresql.org/message-id/20201111215820 [email protected]

Backpatch to 10. In branch master, also move the connection establishment
to occur outside the PG_TRY block; this way we can remove a test for NULL in
PG_FINALLY, and it also makes the code more consistent with similar code in
the same file.

Author: Peter Smith <[email protected]>
Reviewed-by: Bharath Rupireddy <[email protected]>
Reviewed-by: Japin Li <[email protected]>
Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com

Document lock level used by ALTER TABLE VALIDATE CONSTRAINT

Backpatch all the way back to 9.6.

Author: Simon Riggs <[email protected]>
Discussion: https://postgr.es/m/CANbhV-EwxvdhHuOLdfG2ciYrHOHXV=mm6=fD5aMhqcH09Li3Tg@mail.gmail.com

Doc: add an example of a self-referential foreign key to ddl.sgml.

While we've always allowed such cases, the documentation didn't
say you could do it.

Discussion: https://postgr.es/m/161969805833.690.13680986983883602407@wrigleys.postgresql.org

Doc: update libpq's documentation for PQfn().

Mention specifically that you can't call aggregates, window functions,
or procedures this way (the inability to call SRFs was already
mentioned).

Also, the claim that PQfn doesn't support NULL arguments or results
has been a lie since we invented protocol 3.0. Not sure why this
text was never updated for that, but do it now.

Discussion: https://postgr.es/m/2039442.1615317309@sss.pgh.pa.us

Disallow calling anything but plain functions via the fastpath API.

Reject aggregates, window functions, and procedures.  Aggregates
failed anyway, though with a somewhat obscure error message.
Window functions would hit an Assert or null-pointer dereference.
Procedures seemed to work as long as you didn't try to do
transaction control, but (a) transaction control is sort of the
point of a procedure, and (b) it's not entirely clear that no
bugs lurk in that path.  Given the lack of testing of this area,
it seems safest to be conservative in what we support.

Also reject proretset functions, as the fastpath protocol can't
support returning a set.

Also remove an easily-triggered assertion that the given OID
isn't 0; the subsequent lookups can handle that case themselves.

Per report from Theodor-Arsenij Larionov-Trichkin.
Back-patch to all supported branches.  (The procedure angle
only applies in v11+, of course.)

Discussion: https://postgr.es/m/2039442.1615317309@sss.pgh.pa.us

Fix some more omissions in pg_upgrade's tests for non-upgradable types.

Commits 29aeda6e4 et al closed up some oversights involving not checking
for non-upgradable types within container types, such as arrays and
ranges.  However, I only looked at version.c, failing to notice that
there were substantially-equivalent tests in check.c.  (The division
of responsibility between those files is less than clear...)

In addition, because genbki.pl does not guarantee that auto-generated
rowtype OIDs will hold still across versions, we need to consider that
the composite type associated with a system catalog or view is
non-upgradable.  It seems unlikely that someone would have a user
column declared that way, but if they did, trying to read it in another
PG version would likely draw "no such pg_type OID" failures, thanks
to the type OID embedded in composite Datums.

To support the composite and reg*-type cases, extend the recursive
query that does the search to allow any base query that returns
a column of pg_type OIDs, rather than limiting it to exactly one
starting type.

As before, back-patch to all supported branches.

Discussion: https://postgr.es/m/2798740.1619622555@sss.pgh.pa.us

Doc: fix discussion of how to get real Julian Dates.

Somehow I'd convinced myself that rotating to UTC-12 was the way
to do this, but upon further review, it's definitely UTC+12.

Discussion: https://postgr.es/m/1197050.1619123213@sss.pgh.pa.us

Fix use-after-release issue with pg_identify_object_as_address()

Spotted by buildfarm member prion, with -DRELCACHE_FORCE_RELEASE.

Introduced in f7aab36.

Discussion: https://postgr.es/m/2759018.1619577848@sss.pgh.pa.us
Backpatch-through: 9.6

Fix pg_identify_object_as_address() with event triggers

Attempting to use this function with event triggers failed, as, since
its introduction in a676201, this code has never associated an object
name with event triggers. This addresses the failure by adding the
event trigger name to the set defining its object address.

Note that regression tests are added within event_trigger and not
object_address to avoid issues with concurrent connections in parallel
schedules.

Author: Joel Jacobson
Discussion: https://postgr.es/m/3c905e77-a026-46ae-8835-c3f6cd1d24c8@www.fastmail.com
Backpatch-through: 9.6

Doc: document EXTRACT(JULIAN ...), improve Julian Date explanation.

For some reason, the "julian" option for extract()/date_part() has
never gotten listed in the manual. Also, while Appendix B mentioned
in passing that we don't conform to the usual astronomical definition
that a Julian date starts at noon UTC, it was kind of vague about what
we do instead. Clarify that, and add an example showing how to get
the astronomical definition if you want it.

It's been like this for ages, so back-patch to all supported branches.

Discussion: https://postgr.es/m/1197050.1619123213@sss.pgh.pa.us

doc: Fix obsolete description about pg_basebackup.

Previously it was documented that if using "-X none" option there was
no guarantee that all required WAL files were archived at the end of
pg_basebackup when taking a backup from the standby. But this limitation
was removed by commit 52f8a59dd9. Now, even when taking a backup
from the standby, pg_basebackup can wait for all required WAL files
to be archived. Therefore this commit removes such obsolete
description from the docs.

Also this commit adds new description about the limitation when
taking a backup from the standby, into the docs. The limitation is that
pg_basebackup cannot force the standbfy to switch to a new WAL file
at the end of backup, which may cause pg_basebackup to wait a long
time for the last required WAL file to be switched and archived,
especially when write activity on the primary is low.

Back-patch to v10 where the issue was introduced.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi, Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/20210420.133235.1342729068750553399 [email protected]

fix silly perl error in commit d064afc720

Only ever test for non-127.0.0.1 addresses on Windows in PostgresNode

This has been found to cause hangs where tcp usage is forced.

Alexey Kodratov

Discussion: https://postgr.es/m/82e271a9a11928337fcb5b5e57b423c0@postgrespro.ru

Backpatch to all live branches

Allow TestLib::slurp_file to skip contents, and use as needed

In order to avoid getting old logfile contents certain functions in
PostgresNode were doing one of two things. On Windows it rotated the
logfile and restarted the server, while elsewhere it truncated the log
file. Both of these are unnecessary. We borrow from the buildfarm which
does this instead: note the size of the logfile before we start, and
then when fetching the logfile skip to that position before accumulating
contents. This is spelled differently on Windows but the effect is the
same. This is largely centralized in TestLib's slurp_file function,
which has a new optional parameter, the offset to skip to before
starting to reading the file. Code in the client becomes much neater.

Backpatch to all live branches.

Michael Paquier, slightly modified by me.

Discussion: https://postgr.es/m/[email protected]

Fix some inappropriately-disallowed uses of ALTER ROLE/DATABASE SET.

Most GUC check hooks that inspect database state have special checks
that prevent them from throwing hard errors for state-dependent issues
when source == PGC_S_TEST.  This allows, for example,
"ALTER DATABASE d SET default_text_search_config = foo" when the "foo"
configuration hasn't been created yet.  Without this, we have problems
during dump/reload or pg_upgrade, because pg_dump has no idea about
possible dependencies of GUC values and can't ensure a safe restore
ordering.

However, check_role() and check_session_authorization() hadn't gotten
the memo about that, and would throw hard errors anyway.  It's not
entirely clear what is the use-case for "ALTER ROLE x SET role = y",
but we've now heard two independent complaints about that bollixing
an upgrade, so apparently some people are doing it.

Hence, fix these two functions to act more like other check hooks
with similar needs.  (But I did not change their insistence on
being inside a transaction, as it's still not apparent that setting
either GUC from the configuration file would be wise.)

Also fix check_temp_buffers, which had a different form of the disease
of making state-dependent checks without any exception for PGC_S_TEST.
A cursory survey of other GUC check hooks did not find any more issues
of this ilk.  (There are a lot of interdependencies among
PGC_POSTMASTER and PGC_SIGHUP GUCs, which may be a bad idea, but
they're not relevant to the immediate concern because they can't be
set via ALTER ROLE/DATABASE.)

Per reports from Charlie Hornsby and Nathan Bossart.  Back-patch
to all supported branches.

Discussion: https://postgr.es/m/HE1P189MB0523B31598B0C772C908088DB7709@HE1P189MB0523.EURP189.PROD.OUTLOOK.COM
Discussion: https://postgr.es/m/20160711223641 [email protected]

Use "-I." in directories holding Bison parsers, for Oracle compilers.

With the Oracle Developer Studio 12.6 compiler, #line directives alter
the current source file location for purposes of #include "..."
directives.  Hence, a VPATH build failed with 'cannot find include file:
"specscanner.c"'.  With two exceptions, parser-containing directories
already add "-I. -I$(srcdir)"; eliminate the exceptions.  Back-patch to
9.6 (all supported versions).

Port regress-python3-mangle.mk to Solaris "sed".

It doesn't support "$foo$*" like a POSIX "sed" implementation does;
see the Autoconf manual. Back-patch to 9.6 (all supported versions).

Fix old bug with coercing the result of a COLLATE expression.

There are hacks in parse_coerce.c to push down a requested coercion
to below any CollateExpr that may appear.  However, we did that even
if the requested data type is non-collatable, leading to an invalid
expression tree in which CollateExpr is applied to a non-collatable
type.  The fix is just to drop the CollateExpr altogether, reasoning
that it's useless.

This bug is ten years old, dating to the original addition of
COLLATE support.  The lack of field complaints suggests that there
aren't a lot of user-visible consequences.  We noticed the problem
because it would trigger an assertion in DefineVirtualRelation if
the invalid structure appears as an output column of a view; however,
in a non-assert build, you don't see a crash just a (subtly incorrect)
complaint about applying collation to a non-collatable type.  I found
that by putting the incorrect structure further down in a view, I could
make a view definition that would fail dump/reload, per the added
regression test case.  But CollateExpr doesn't do anything at run-time,
so this likely doesn't lead to any really exciting consequences.

Per report from Yulin Pei.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/HK0PR01MB22744393C474D503E16C8509F4709@HK0PR01MB2274.apcprd01.prod.exchangelabs.com

Fix out-of-bound memory access for interval -> char conversion

Using Roman numbers (via "RM" or "rm") for a conversion to calculate a
number of months has never considered the case of negative numbers,
where a conversion could easily cause out-of-bound memory accesses. The
conversions in themselves were not completely consistent either, as
specifying 12 would result in NULL, but it should mean XII.

This commit reworks the conversion calculation to have a more
consistent behavior:
- If the number of months and years is 0, return NULL.
- If the number of months is positive, return the exact month number.
- If the number of months is negative, do a backward calculation, with
-1 meaning December, -2 November, etc.

Reported-by: Theodor Arsenij Larionov-Trichkin
Author: Julien Rouhaud
Discussion: https://postgr.es/m/16953-f255a18f8c51f1d5@postgresql.org
backpatch-through: 9.6

Fix typo

Author: Daniel Westermann
Backpatch-through: 9.6
Discussion: https://postgr.es/m/GV0P278MB0483A7AA85BAFCC06D90F453D2739@GV0P278MB0483.CHEP278.PROD.OUTLOOK.COM

Fix typos and grammar in documentation and code comments

Comment fixes are applied on HEAD, and documentation improvements are
applied on back-branches where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210408164008 [email protected]
Backpatch-through: 9.6

Don't add non-existent pages to bitmap from BRIN

The code in bringetbitmap() simply added the whole matching page range
to the TID bitmap, as determined by pages_per_range, even if some of the
pages were beyond the end of the heap. The query then might fail with
an error like this:

  ERROR:  could not open file "base/20176/20228.2" (target block
          262144): previous segment is only 131021 blocks

In this case, the relation has 262093 pages (131072 and 131021 pages),
but we're trying to acess block 262144, i.e. first block of the 3rd
segment. At that point _mdfd_getseg() notices the preceding segment is
incomplete, and fails.

Hitting this in practice is rather unlikely, because:

* Most indexes use power-of-two ranges, so segments and page ranges
  align perfectly (segment end is also a page range end).

* The table size has to be just right, with the last segment being
  almost full - less than one page range from full segment, so that the
  last page range actually crosses the segment boundary.

* Prefetch has to be enabled. The regular page access checks that
  pages are not beyond heap end, but prefetch does not. On older
  releases (before 12) the execution stops after hitting the first
  non-existent page, so the prefetch distance has to be sufficient
  to reach the first page in the next segment to trigger the issue.
  Since 12 it's enough to just have prefetch enabled, the prefetch
  distance does not matter.

Fixed by not adding non-existent pages to the TID bitmap. Backpatch
all the way back to 9.6 (BRIN indexes were introduced in 9.5, but that
release is EOL).

Backpatch-through: 9.6

Shut down transaction tracking at startup process exit.

Maxim Orlov reported that the shutdown of standby server could result in
the following assertion failure. The cause of this issue was that,
when the shutdown caused the startup process to exit, recovery-time
transaction tracking was not shut down even if it's already initialized,
and some locks the tracked transactions were holding could not be released.
At this situation, if other process was invoked and the PGPROC entry that
the startup process used was assigned to it, it found such unreleased locks
and caused the assertion failure, during the initialization of it.

TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))"

This commit fixes this issue by making the startup process shut down
transaction tracking and release all locks, at the exit of it.

Back-patch to all supported branches.

Reported-by: Maxim Orlov
Author: Fujii Masao
Reviewed-by: Maxim Orlov
Discussion: https://postgr.es/m/ad4ce692cc1d89a093b471ab1d969b0b@postgrespro.ru

Use macro MONTHS_PER_YEAR instead of '12' in /ecpg/pgtypeslib

All other places already use MONTHS_PER_YEAR appropriately.

Backpatch-through: 9.6

Clarify documentation of RESET ROLE

Command-line options, or previous "ALTER (ROLE|DATABASE) ...
SET ROLE ..." commands, can change the value of the default role
for a session. In the presence of one of these, RESET ROLE will
change the current user identifier to the default role rather
than the session user identifier. Fix the documentation to
reflect this reality. Backpatch to all supported versions.

Author: Nathan Bossart
Reviewed-By: Laurenz Albe, David G. Johnston, Joe Conway
Reported by: Nathan Bossart
Discussion: https://postgr.es/m/flat/925134DB-8212-4F60-8AB1-B1231D750CB4%40amazon.com
Backpatch-through: 9.6

doc: Clarify how to generate backup files with non-exclusive backups

The current instructions describing how to write the backup_label and
tablespace_map files are confusing. For example, opening a file in text
mode on Windows and copy-pasting the file's contents would result in a
failure at recovery because of the extra CRLF characters generated. The
documentation was not stating that clearly, and per discussion this is
not considered as a supported scenario.

This commit extends a bit the documentation to mention that it may be
required to open the file in binary mode before writing its data.

Reported-by: Wang Shenhao
Author: David Steele
Reviewed-by: Andrew Dunstan, Magnus Hagander
Discussion: https://postgr.es/m/8373f61426074f2cb6be92e02f838389@G08CNEXMBPEKD06.g08.fujitsu.local
Backpatch-through: 9.6

doc: mention that intervening major releases can be skipped

Also mention that you should read the intervening major releases notes.
This change was also applied to the website.

Discussion: https://postgr.es/m/20210330144949 [email protected]

Backpatch-through: 9.6

Fix pg_restore's misdesigned code for detecting archive file format.

Despite the clear comments pointing out that the duplicative code
segments in ReadHead() and _discoverArchiveFormat() needed to be
in sync, they were not: the latter did not bother to apply any of
the sanity checks in the former.  We'd missed noticing this partly
because none of those checks would fail in scenarios we customarily
test, and partly because the oversight would be masked if both
segments execute, which they would in cases other than needing to
autodetect the format of a non-seekable stdin source.  However,
in a case meeting all these requirements --- for example, trying
to read a newer-than-supported archive format from non-seekable
stdin --- pg_restore missed applying the version check and would
likely dump core or otherwise misbehave.

The whole thing is silly anyway, because there seems little reason
to duplicate the logic beyond the one-line verification that the
file starts with "PGDMP".  There seems to have been an undocumented
assumption that multiple major formats (major enough to require
separate reader modules) would nonetheless share the first half-dozen
fields of the custom-format header.  This seems unlikely, so let's
fix it by just nuking the duplicate logic in _discoverArchiveFormat().

Also get rid of the pointless attempt to seek back to the start of
the file after successful autodetection.  That wastes cycles and
it means we have four behaviors to verify not two.

Per bug #16951 from Sergey Koposov.  This has been broken for
decades, so back-patch to all supported versions.

Discussion: https://postgr.es/m/16951-a4dd68cf0de23048@postgresql.org

doc: Clarify use of ACCESS EXCLUSIVE lock in various sections

Some sections of the documentation used "exclusive lock" to describe
that an ACCESS EXCLUSIVE lock is taken during a given operation. This
can be confusing to the reader as ACCESS SHARE is allowed with an
EXCLUSIVE lock is used, but that would not be the case with what is
described on those parts of the documentation.

Author: Greg Rychlewski
Discussion: https://postgr.es/m/CAKemG7VptD=7fNWckFMsMVZL_zzvgDO6v2yVmQ+ZiBfc_06kCQ@mail.gmail.com
Backpatch-through: 9.6

Add a docs section for obsoleted and renamed functions and settings

The new appendix groups information on renamed or removed settings,
commands, etc into an out-of-the-way part of the docs.

The original id elements are retained in each subsection to ensure that
the same filenames are produced for HTML docs. This prevents /current/
links on the web from breaking, and allows users of the web docs
to follow links from old version pages to info on the changes in the
new version. Prior to this change, a link to /current/ for renamed
sections like the recovery.conf docs would just 404. Similarly if
someone searched for recovery.conf they would find the pg11 docs,
but there would be no /12/ or /current/ link, so they couldn't easily
find out that it was removed in pg12 or how to adapt.

Index entries are also added so that there's a breadcrumb trail for
users to follow when they know the old name, but not what we changed it
to. So a user who is trying to find out how to set standby_mode in
PostgreSQL 12+, or where pg_resetxlog went, now has more chance of
finding that information.

Craig Ringer and Stephen Frost
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/CAGRY4nzPNOyYQ_1-pWYToUVqQ0ThqP5jdURnJMZPm539fdizOg%40mail.gmail.com
Backpatch-through: 10

Update obsolete comment.

Back-patch to all supported branches.

Author: Etsuro Fujita
Discussion: https://postgr.es/m/CAPmGK17DwzaSf%2BB71dhL2apXdtG-OmD6u2AL9Cq2ZmAR0%2BzapQ%40mail.gmail.com

doc: Define TLS as an acronym

Commit c6763156589 added an acronym reference for "TLS" but the definition
was never added.

Author: Daniel Gustafsson
Reviewed-by: Michael Paquier
Backpatch-through: 9.6
Discussion: https://postgr.es/m/27109504-82DB-41A8-8E63-C0498314F5B0@yesql.se

Fix ndistinct estimates with system attributes

When estimating the number of groups using extended statistics, the code
was discarding information about system attributes. This led to strange
situation that

SELECT 1 FROM t GROUP BY ctid;

could have produced higher estimate (equal to pg_class.reltuples) than

SELECT 1 FROM t GROUP BY a, b, ctid;

with extended statistics on (a,b). Fixed by retaining information about
the system attribute.

Backpatch all the way to 10, where extended statistics were introduced.

Author: Tomas Vondra
Backpatch-through: 10

Fix bug in WAL replay of COMMIT_TS_SETTS record.

Previously the WAL replay of COMMIT_TS_SETTS record called
TransactionTreeSetCommitTsData() with the argument write_xlog=true,
which generated and wrote new COMMIT_TS_SETTS record.
This should not be acceptable because it's during recovery.

This commit fixes the WAL replay of COMMIT_TS_SETTS record
so that it calls TransactionTreeSetCommitTsData() with write_xlog=false
and doesn't generate new WAL during recovery.

Back-patch to all supported branches.

Reported-by: lx zou <[email protected]>
Author: Fujii Masao
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/16931-620d0f2fdc6108f1@postgresql.org

Fix psql's \connect command some more.

Jasen Betts reported yet another unintended side effect of commit
85c54287a: reconnecting with "\c service=whatever" did not have the
expected results.  The reason is that starting from the output of
PQconndefaults() effectively allows environment variables (such
as PGPORT) to override entries in the service file, whereas the
normal priority is the other way around.

Not using PQconndefaults at all would require yet a third main code
path in do_connect's parameter setup, so I don't really want to fix
it that way.  But we can have the logic effectively ignore all the
default values for just a couple more lines of code.

This patch doesn't change the behavior for "\c -reuse-previous=on
service=whatever".  That remains significantly different from before
85c54287a, because many more parameters will be re-used, and thus
not be possible for service entries to replace.  But I think this
is (mostly?) intentional.  In any case, since libpq does not report
where it got parameter values from, it's hard to do differently.

Per bug #16936 from Jasen Betts.  As with the previous patches,
back-patch to all supported branches.  (9.5 is unfortunately now
out of support, so this won't get fixed there.)

Discussion: https://postgr.es/m/16936-3f524322a53a29f0@postgresql.org

Use correct spelling of statistics kind

A couple error messages and comments used 'statistic kind', not the
correct 'statistics kind'. Fix and backpack all the way back to 10,
where extended statistics were introduced.

Backpatch-through: 10

pg_waldump: Fix bug in per-record statistics.

pg_waldump --stats=record identifies a record by a combination
of the RmgrId and the four bits of the xl_info field of the record.
But XACT records use the first bit of those four bits for an optional
flag variable, and the following three bits for the opcode to
identify a record. So previously the same type of XACT record
could have different four bits (three bits are the same but the
first one bit is different), and which could cause
pg_waldump --stats=record to show two lines of per-record statistics
for the same XACT record. This is a bug.

This commit changes pg_waldump --stats=record so that it processes
only XACT record differently, i.e., filters the opcode out of xl_info
and uses a combination of the RmgrId and those three bits as
the identifier of a record, only for XACT record. For other records,
the four bits of the xl_info field are still used.

Back-patch to all supported branches.

Author: Kyotaro Horiguchi
Reviewed-by: Shinya Kato, Fujii Masao
Discussion: https://postgr.es/m/2020100913412132258847@highgo.ca

Fix new TAP test for 2PC transactions and PITRs on Windows

The test added by 595b9cb forgot that on Windows it is necessary to set
up pg_hba.conf (see PostgresNode::set_replication_conf) with a specific
entry or base backups fail.  Any node that requires to support
replication just needs to pass down allows_streaming at initialization.
This updates the test to do so.  Simplify things a bit while on it.

Per buildfarm member fairywren.  Any Windows hosts running this test
would have failed, and I have reproduced the problem as well.

Backpatch-through: 10

Fix timeline assignment in checkpoints with 2PC transactions

Any transactions found as still prepared by a checkpoint have their
state data read from the WAL records generated by PREPARE TRANSACTION
before being moved into their new location within pg_twophase/.  While
reading such records, the WAL reader uses the callback
read_local_xlog_page() to read a page, that is shared across various
parts of the system.  This callback, since 1148e22a, has introduced an
update of ThisTimeLineID when reading a record while in recovery, which
is potentially helpful in the context of cascading WAL senders.

This update of ThisTimeLineID interacts badly with the checkpointer if a
promotion happens while some 2PC data is read from its record, as, by
changing ThisTimeLineID, any follow-up WAL records would be written to
an timeline older than the promoted one.  This results in consistency
issues.  For instance, a subsequent server restart would cause a failure
in finding a valid checkpoint record, resulting in a PANIC, for
instance.

This commit changes the code reading the 2PC data to reset the timeline
once the 2PC record has been read, to prevent messing up with the static
state of the checkpointer.  It would be tempting to do the same thing
directly in read_local_xlog_page().  However, based on the discussion
that has led to 1148e22a, users may rely on the updates of
ThisTimeLineID when a WAL record page is read in recovery, so changing
this callback could break some cases that are working currently.

A TAP test reproducing the issue is added, relying on a PITR to
precisely trigger a promotion with a prepared transaction still
tracked.

Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao
and myself.

Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap
Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com
Backpatch-through: 10

Fix memory leak when rejecting bogus DH parameters.

While back-patching e0e569e1d, I noted that there were some other
places where we ought to be applying DH_free(); namely, where we
load some DH parameters from a file and then reject them as not
being sufficiently secure. While it seems really unlikely that
anybody would hit these code paths in production, let alone do
so repeatedly, let's fix it for consistency.

Back-patch to v10 where this code was introduced.

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org

Fix memory leak when initializing DH parameters in backend

When loading DH parameters used for the generation of ephemeral DH keys
in the backend, the code has never bothered releasing the memory used
for the DH information loaded from a file or from libpq's default. This
commit makes sure that the information is properly free()'d.

Back-patch of e0e569e1d. We originally thought the leak was minor and
not worth back-patching, but Jelte Fennema pointed out that repeated
SIGHUP's can result in very serious bloat of the postmaster, which is
then multiplied by being duplicated into eadh forked child.

Back-patch to v10; the code looked different before c0a15e07c,
and didn't have a leak in the actually-live code paths.

Michael Paquier

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org

Don't leak malloc'd error string in libpqrcv_check_conninfo().

We leaked the error report from PQconninfoParse, when there was
one. It seems unlikely that real usage patterns would repeat
the failure often enough to create serious bloat, but let's
back-patch anyway to keep the code similar in all branches.

Found via valgrind testing.
Back-patch to v10 where this code was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us

Don't leak malloc'd strings when a GUC setting is rejected.

Because guc.c prefers to keep all its string values in malloc'd
not palloc'd storage, it has to be more careful than usual to
avoid leaks. Error exits out of string GUC hook checks failed
to clear the proposed value string, and error exits out of
ProcessGUCArray() failed to clear the malloc'd results of
ParseLongOption().

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us

Don't leak compiled regex(es) when an ispell cache entry is dropped.

The text search cache mechanisms assume that we can clean up
an invalidated dictionary cache entry simply by resetting the
associated long-lived memory context.  However, that does not work
for ispell affixes that make use of regular expressions, because
the regex library deals in plain old malloc.  Hence, we leaked
compiled regex(es) any time we dropped such a cache entry.  That
could quickly add up, since even a fairly trivial regex can use up
tens of kB, and a large one can eat megabytes.  Add a memory context
callback to ensure that a regex gets freed when its owning cache
entry is cleared.

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us

Don't leak rd_statlist when a relcache entry is dropped.

Although these lists are usually NIL, and even when not empty
are unlikely to be large, constant relcache update traffic could
eventually result in visible bloat of CacheMemoryContext.

Found via valgrind testing.
Back-patch to v10 where this field was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us

Prevent buffer overrun in read_tablespace_map().

Robert Foggia of Trustwave reported that read_tablespace_map()
fails to prevent an overrun of its on-stack input buffer.
Since the tablespace map file is presumed trustworthy, this does
not seem like an interesting security vulnerability, but still
we should fix it just in the name of robustness.

While here, document that pg_basebackup's --tablespace-mapping option
doesn't work with tar-format output, because it doesn't. To make it
work, we'd have to modify the tablespace_map file within the tarball
sent by the server, which might be possible but I'm not volunteering.
(Less-painful solutions would require changing the basebackup protocol
so that the source server could adjust the map. That's not very
appetizing either.)

Avoid corner-case memory leak in SSL parameter processing.

After reading the root cert list from the ssl_ca_file, immediately
install it as client CA list of the new SSL context. That gives the
SSL context ownership of the list, so that SSL_CTX_free will free it.
This avoids a permanent memory leak if we fail further down in
be_tls_init(), which could happen if bogus CRL data is offered.

The leak could only amount to something if the CRL parameters get
broken after server start (else we'd just quit) and then the server
is SIGHUP'd many times without fixing the CRL data. That's rather
unlikely perhaps, but it seems worth fixing, if only because the
code is clearer this way.

While we're here, add some comments about the memory management
aspects of this logic.

Noted by Jelte Fennema and independently by Andres Freund.
Back-patch to v10; before commit de41869b6 it doesn't matter,
since we'd not re-execute this code during SIGHUP.

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org

Fix race condition in psql \e's detection of file modification.

psql's editing commands decide whether the user has edited the file
by checking for change of modification timestamp. This is probably
fine for a pre-existing file, but with a temporary file that is
created within the command, it's possible for a fast typist to
save-and-exit in less than the one-second granularity of stat(2)
timestamps. On Windows FAT filesystems the granularity is even
worse, 2 seconds, making the race a bit easier to hit.

To fix, try to set the temp file's mod time to be two seconds ago.
It's unlikely this would fail, but then again the race condition
itself is unlikely, so just ignore any error.

Also, we might as well check the file size as well as its mod time.

While this is a difficult bug to hit, it still seems worth
back-patching, to ensure that users' edits aren't lost.

Laurenz Albe, per gripe from Jacob Champion; based on fix suggestions
from Jacob and myself

Discussion: https://postgr.es/m/0ba3f2a658bac6546d9934ab6ba63a805d46a49b [email protected]

Forbid marking an identity column as nullable.

GENERATED ALWAYS AS IDENTITY implies NOT NULL, but the code failed
to complain if you overrode that with "GENERATED ALWAYS AS IDENTITY
NULL". One might think the old behavior was a feature, but it was
inconsistent because the outcome varied depending on the order of
the clauses, so it seems to have been just an oversight.

Per bug #16913 from Pavel Boev. Back-patch to v10 where identity
columns were introduced.

Vik Fearing (minor tweaks by me)

Discussion: https://postgr.es/m/16913-3b5198410f67d8c6@postgresql.org

Re-simplify management of inStart in pqParseInput3's subroutines.

Commit 92785dac2 copied some logic related to advancement of inStart
from pqParseInput3 into getRowDescriptions and getAnotherTuple,
because it wanted to allow user-defined row processor callbacks to
potentially longjmp out of the library, and inStart would have to be
updated before that happened to avoid an infinite loop.  We later
decided that that API was impossibly fragile and reverted it, but
we didn't undo all of the related code changes, and this bit of
messiness survived.  Undo it now so that there's just one place in
pqParseInput3's processing where inStart is advanced; this will
simplify addition of better tracing support.

getParamDescriptions had grown similar processing somewhere along
the way (not in 92785dac2; I didn't track down just when), but it's
actually buggy because its handling of corrupt-message cases seems to
have been copied from the v2 logic where we lacked a known message
length.  The cases where we "goto not_enough_data" should not simply
return EOF, because then we won't consume the message, potentially
creating an infinite loop.  That situation now represents a
definitively corrupt message, and we should report it as such.

Although no field reports of getParamDescriptions getting stuck in
a loop have been seen, it seems appropriate to back-patch that fix.
I chose to back-patch all of this to keep the logic looking more alike
in supported branches.

Discussion: https://postgr.es/m/2217283.1615411989@sss.pgh.pa.us

tutorial:  land height is "elevation", not "altitude"

This is a follow-on patch to 92c12e46d5.  In that patch, we renamed
"altitude" to "elevation" in the docs, based on these details:

   https://mapscaping.com/blogs/geo-candy/what-is-the-difference-between-elevation-relief-and-altitude

This renames the tutorial SQL files to match the documentation.

Reported-by: [email protected]
Discussion: https://postgr.es/m/161512392887.1046.3137472627109459518@wrigleys.postgresql.org

Backpatch-through: 9.6

Validate the OID argument of pg_import_system_collations().

"SELECT pg_import_system_collations(0)" caused an assertion failure.
With a random nonzero argument --- or indeed with zero, in non-assert
builds --- it would happily make pg_collation entries with garbage
values of collnamespace. These are harmless as far as I can tell
(unless maybe the OID happens to become used for a schema, later on?).
In any case this isn't a security issue, since the function is
superuser-only. But it seems like a gotcha for unwary DBAs, so let's
add a check that the given OID belongs to some schema.

Back-patch to v10 where this function was introduced.

Clarify the usage of max_replication_slots on the subscriber side.

It was not clear in the docs that the max_replication_slots is also used
to track replication origins on the subscriber side.

Author: Paul Martinez
Reviewed-by: Amit Kapila
Backpatch-through: 10 where logical replication was introduced
Discussion: https://postgr.es/m/CACqFVBZgwCN_pHnW6dMNCrOS7tiHCw6Retf_=U2Vvj3aUSeATw@mail.gmail.com

Use native path separators to pg_ctl in initdb

On Windows, CMD.EXE allegedly does not run a command that uses forward slashes,
so let's convert the path to use backslashes instead.

Backpatch to 10.

Author: Nitin Jadhav <[email protected]>
Reviewed-by: Juan José Santamaría Flecha <[email protected]>
Discussion: https://postgr.es/m/CAMm1aWaNDuaPYFYMAqDeJrZmPtNvLcJRS++CcZWY8LT6KcoBZw@mail.gmail.com