Tom Lane [Mon, 25 Mar 2019 21:18:06 +0000 (17:18 -0400)]
Doc: clarify that REASSIGN OWNED doesn't handle default privileges.
It doesn't touch regular privileges either, but only the latter was
explicitly stated.
Discussion: https://postgr.es/m/
155348282848.9808.
12629518043813943231@wrigleys.postgresql.org
Tom Lane [Sun, 24 Mar 2019 19:13:21 +0000 (15:13 -0400)]
Avoid double-free in vacuumlo error path.
The code would do "PQclear(res)" twice if lo_unlink failed, evidently
due to careless thinking about how far out a "break" would break.
Remove the extra PQclear and adjust the loop logic so that we'll fall
out of both levels of loop after an error, as was clearly the intent.
Spotted by Coverity. I have no idea why it took this long to notice,
since the bug has been there since commit
67ccbb080. Accordingly,
back-patch to all supported branches.
Alexander Korotkov [Sun, 24 Mar 2019 12:26:45 +0000 (15:26 +0300)]
Fix WAL format incompatibility introduced by backpatching of
52ac6cd2d0
52ac6cd2d0 added new field to ginxlogDeletePage and was backpatched to 9.4.
That led to problems when patched postgres instance applies WAL records
generated by non-patched one. WAL records generated by non-patched instance
don't contain new field, which patched one is expecting to see.
Thankfully, we can distinguish patched and non-patched WAL records by their data
size. If we see that WAL record is generated by non-patched instance, we skip
processing of new field. This commit comes with some assertions. In
particular, if it appears that on some platform struct data size didn't change
then static assertion will trigger.
Reported-by: Simon Riggs
Discussion: https://postgr.es/m/CANP8%2Bj%2BK4whxf7ET7%2BgO%2BG-baC3-WxqqH%3DnV4X2CgfEPA3Yu3g%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Simon Riggs, Alvaro Herrera
Backpatch-through: 9.4
Tom Lane [Sat, 23 Mar 2019 21:40:19 +0000 (17:40 -0400)]
Remove inadequate check for duplicate "xml" PI.
I failed to think about PIs starting with "xml". We don't really
need this check at all, so just take it out. Oversight in
commit
8d1dadb25 et al.
Tom Lane [Sat, 23 Mar 2019 21:35:05 +0000 (17:35 -0400)]
Revert strlen -> strnlen optimization pre-v11.
We don't have a src/port substitute for that function in older branches,
so it fails on platforms lacking the function natively. Per buildfarm.
Tom Lane [Sat, 23 Mar 2019 20:51:26 +0000 (16:51 -0400)]
Ensure xmloption = content while restoring pg_dump output.
In combination with the previous commit, this ensures that valid XML
data can always be dumped and reloaded, whether it is "document"
or "content".
Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
Tom Lane [Sat, 23 Mar 2019 20:24:30 +0000 (16:24 -0400)]
Accept XML documents when xmloption = content, as required by SQL:2006+.
Previously we were using the SQL:2003 definition, which doesn't allow
this, but that creates a serious dump/restore gotcha: there is no
setting of xmloption that will allow all valid XML data. Hence,
switch to the 2006 definition.
Since libxml doesn't accept <!DOCTYPE> directives in the mode we
use for CONTENT parsing, the implementation is to detect <!DOCTYPE>
in the input and switch to DOCUMENT parsing mode. This should not
cost much, because <!DOCTYPE> should be close to the front of the
input if it's there at all. It's possible that this causes the
error messages for malformed input to be slightly different than
they were before, if said input includes <!DOCTYPE>; but that does
not seem like a big problem.
In passing, buy back a few cycles in parsing of large XML documents
by not doing strlen() of the whole input in parse_xml_decl().
Back-patch because dump/restore failures are not nice. This change
shouldn't break any cases that worked before, so it seems safe to
back-patch.
Chapman Flack (revised a bit by me)
Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
Tom Lane [Tue, 19 Mar 2019 16:49:27 +0000 (12:49 -0400)]
Make checkpoint requests more robust.
Commit
6f6a6d8b1 introduced a delay of up to 2 seconds if we're trying
to request a checkpoint but the checkpointer hasn't started yet (or,
much less likely, our kill() call fails). However buildfarm experience
shows that that's not quite enough for slow or heavily-loaded machines.
There's no good reason to assume that the checkpointer won't start
eventually, so we may as well make the timeout much longer, say 60 sec.
However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad
idea to be waiting at all, much less for as long as 60 sec. We can
remove the need for that, and make this whole thing more robust, by
adjusting the code so that the existence of a pending checkpoint
request is clear from the contents of shared memory, and making sure
that the checkpointer process will notice it at startup even if it did
not get a signal. In this way there's no need for a non-CHECKPOINT_WAIT
call to wait at all; if it can't send the signal, it can nonetheless
assume that the checkpointer will eventually service the request.
A potential downside of this change is that "kill -INT" on the checkpointer
process is no longer enough to trigger a checkpoint, should anyone be
relying on something so hacky. But there's no obvious reason to do it
like that rather than issuing a plain old CHECKPOINT command, so we'll
assume that nobody is. There doesn't seem to be a way to preserve this
undocumented quasi-feature without introducing race conditions.
Since a principal reason for messing with this is to prevent intermittent
buildfarm failures, back-patch to all supported branches.
Discussion: https://postgr.es/m/27830.
1552752475@sss.pgh.pa.us
Tom Lane [Thu, 14 Mar 2019 16:16:10 +0000 (12:16 -0400)]
Ensure dummy paths have correct required_outer if rel is parameterized.
The assertions added by commits
34ea1ab7f et al found another problem:
set_dummy_rel_pathlist and mark_dummy_rel were failing to label
the dummy paths they create with the correct outer_relids, in case
the relation is necessarily parameterized due to having lateral
references in its tlist. It's likely that this has no user-visible
consequences in production builds, at the moment; but still an assertion
failure is a bad thing, so back-patch the fix.
Per bug #15694 from Roman Zharkov (via Alexander Lakhin)
and an independent report by Tushar Ahuja.
Discussion: https://postgr.es/m/15694-
74f2ca97e7044f7f@postgresql.org
Discussion: https://postgr.es/m/
7d72ab20-c725-3ce2-f99d-
4e64dd8a0de6@enterprisedb.com
Michael Meskes [Mon, 11 Mar 2019 15:11:16 +0000 (16:11 +0100)]
Fix potential memory access violation in ecpg if filename of include file is
shorter than 2 characters.
Patch by: "Wu, Fei" <
[email protected]>
Tom Lane [Sun, 10 Mar 2019 16:58:52 +0000 (12:58 -0400)]
Disallow NaN as a value for floating-point GUCs.
None of the code that uses GUC values is really prepared for them to
hold NaN, but parse_real() didn't have any defense against accepting
such a value. Treat it the same as a syntax error.
I haven't attempted to analyze the exact consequences of setting any
of the float GUCs to NaN, but since they're quite unlikely to be good,
this seems like a back-patchable bug fix.
Note: we don't need an explicit test for +-Infinity because those will
be rejected by existing range checks. I added a regression test for
that in HEAD, but not older branches because the spelling of the value
in the error message will be platform-dependent in branches where we
don't always use port/snprintf.c.
Discussion: https://postgr.es/m/1798.
1552165479@sss.pgh.pa.us
Tom Lane [Sat, 9 Mar 2019 23:42:19 +0000 (18:42 -0500)]
Simplify release-note links to back branches.
Now that https://www.postgresql.org/docs/release/ is populated,
replace the stopgap text we had under "Prior Releases" with
a pointer to that archive.
Discussion: https://postgr.es/m/
e0f09c9a-bd2b-862a-d379-
601dfabc8969@postgresql.org
Michael Paquier [Mon, 4 Mar 2019 00:50:24 +0000 (09:50 +0900)]
Fix error handling of readdir() port implementation on first file lookup
The implementation of readdir() in src/port/ which gets used by MSVC has
been added in
399a36a, and since the beginning it considers all errors
on the first file lookup as ENOENT, setting errno accordingly and
letting the routine caller think that the directory is empty. While
this is normally enough for the case of the backend, this can confuse
callers of this routine on Windows as all errors would map to the same
behavior. So, for example, even permission errors would be thought as
having an empty directory, while there could be contents in it.
This commit changes the error handling so as readdir() gets a behavior
similar to native implementations: force errno=0 when seeing
ERROR_FILE_NOT_FOUND as error and consider other errors as plain
failures.
While looking at the patch, I noticed that MinGW does not enforce
errno=0 when looking at the first file, but it gets enforced on the next
file lookups. A comment related to that was incorrect in the code.
Reported-by: Yuri Kurenkov
Diagnosed-by: Yuri Kurenkov, Grigory Smolkin
Author: Konstantin Knizhnik
Reviewed-by: Andrew Dunstan, Michael Paquier
Discussion: https://postgr.es/m/
2cad7829-8d66-e39c-b937-
ac825db5203d@postgrespro.ru
Backpatch-through: 9.4
Dean Rasheed [Sun, 3 Mar 2019 10:58:45 +0000 (10:58 +0000)]
Further fixing for multi-row VALUES lists for updatable views.
Previously, rewriteTargetListIU() generated a list of attribute
numbers from the targetlist, which were passed to rewriteValuesRTE(),
which expected them to contain the same number of entries as there are
columns in the VALUES RTE, and to be in the same order. That was fine
when the target relation was a table, but for an updatable view it
could be broken in at least three different ways ---
rewriteTargetListIU() could insert additional targetlist entries for
view columns with defaults, the view columns could be in a different
order from the columns of the underlying base relation, and targetlist
entries could be merged together when assigning to elements of an
array or composite type. As a result, when recursing to the base
relation, the list of attribute numbers generated from the rewritten
targetlist could no longer be relied upon to match the columns of the
VALUES RTE. We got away with that prior to
41531e42d3 because it used
to always be the case that rewriteValuesRTE() did nothing for the
underlying base relation, since all DEFAULTS had already been replaced
when it was initially invoked for the view, but that was incorrect
because it failed to apply defaults from the base relation.
Fix this by examining the targetlist entries more carefully and
picking out just those that are simple Vars referencing the VALUES
RTE. That's sufficient for the purposes of rewriteValuesRTE(), which
is only responsible for dealing with DEFAULT items in the VALUES
RTE. Any DEFAULT item in the VALUES RTE that doesn't have a matching
simple-Var-assignment in the targetlist is an error which we complain
about, but in theory that ought to be impossible.
Additionally, move this code into rewriteValuesRTE() to give a clearer
separation of concerns between the 2 functions. There is no need for
rewriteTargetListIU() to know about the details of the VALUES RTE.
While at it, fix the comment for rewriteValuesRTE() which claimed that
it doesn't support array element and field assignments --- that hasn't
been true since
a3c7a993d5 (9.6 and later).
Back-patch to all supported versions, with minor differences for the
pre-9.6 branches, which don't support array element and field
assignments to the same column in multi-row VALUES lists.
Reviewed by Amit Langote.
Discussion: https://postgr.es/m/15623-
5d67a46788ec8b7f@postgresql.org
Michael Paquier [Thu, 28 Feb 2019 02:02:40 +0000 (11:02 +0900)]
Improve documentation of data_sync_retry
Reflecting an updated parameter value requires a server restart, which
was not mentioned in the documentation and in postgresql.conf.sample.
Reported-by: Thomas Poty
Discussion: https://postgr.es/m/15659-
0cd812f13027a2d8@postgresql.org
Tom Lane [Sun, 24 Feb 2019 17:51:51 +0000 (12:51 -0500)]
Fix ecpg bugs caused by missing semicolons in the backend grammar.
The Bison documentation clearly states that a semicolon is required
after every grammar rule, and our scripts that generate ecpg's
grammar from the backend's implicitly assumed this is true. But it
turns out that only ancient versions of Bison actually enforce that.
There have been a couple of rules without trailing semicolons in
gram.y for some time, and as a consequence, ecpg's grammar was faulty
and produced wrong output for the affected statements.
To fix, add the missing semis, and add some cross-checks to ecpg's
scripts so that they'll bleat if we mess this up again.
The cases that were broken were:
* "SET variable = DEFAULT" (but not "SET variable TO DEFAULT"),
as well as allied syntaxes such as ALTER SYSTEM SET ... DEFAULT.
These produced syntactically invalid output that the server
would reject.
* Multiple type names in DROP TYPE/DOMAIN commands. Only the
first type name would be listed in the emitted command.
Per report from Daisuke Higuchi. Back-patch to all supported versions.
Discussion: https://postgr.es/m/
1803D792815FC24D871C00D17AE95905DB51CE@g01jpexmbkw24
Thomas Munro [Sun, 24 Feb 2019 10:59:26 +0000 (23:59 +1300)]
Tolerate EINVAL when calling fsync() on a directory.
Previously, we tolerated EBADF as a way for the operating system to
indicate that it doesn't support fsync() on a directory. Tolerate
EINVAL too, for older versions of Linux CIFS.
Bug #15636. Back-patch all the way.
Reported-by: John Klann
Discussion: https://postgr.es/m/15636-
d380890dafd78fc6@postgresql.org
Tom Lane [Fri, 22 Feb 2019 17:23:00 +0000 (12:23 -0500)]
Fix plan created for inherited UPDATE/DELETE with all tables excluded.
In the case where inheritance_planner() finds that every table has
been excluded by constraints, it thought it could get away with
making a plan consisting of just a dummy Result node. While certainly
there's no updating or deleting to be done, this had two user-visible
problems: the plan did not report the correct set of output columns
when a RETURNING clause was present, and if there were any
statement-level triggers that should be fired, it didn't fire them.
Hence, rather than only generating the dummy Result, we need to
stick a valid ModifyTable node on top, which requires a tad more
effort here.
It's been broken this way for as long as inheritance_planner() has
known about deleting excluded subplans at all (cf commit
635d42e9c),
so back-patch to all supported branches.
Amit Langote and Tom Lane, per a report from Petr Fedorov.
Discussion: https://postgr.es/m/
5da6f0f0-1364-1876-6978-
907678f89a3e@phystech.edu
Dean Rasheed [Wed, 20 Feb 2019 08:19:55 +0000 (08:19 +0000)]
Fix DEFAULT-handling in multi-row VALUES lists for updatable views.
INSERT ... VALUES for a single VALUES row is implemented differently
from a multi-row VALUES list, which causes inconsistent behaviour in
the way that DEFAULT items are handled. In particular, when inserting
into an auto-updatable view on top of a table with a column default, a
DEFAULT item in a single VALUES row gets correctly replaced with the
table column's default, but for a multi-row VALUES list it is replaced
with NULL.
Fix this by allowing rewriteValuesRTE() to leave DEFAULT items in the
VALUES list untouched if the target relation is an auto-updatable view
and has no column default, deferring DEFAULT-expansion until the query
against the base relation is rewritten. For all other types of target
relation, including tables and trigger- and rule-updatable views, we
must continue to replace DEFAULT items with NULL in the absence of a
column default.
This is somewhat complicated by the fact that if an auto-updatable
view has DO ALSO rules attached, the VALUES lists for the product
queries need to be handled differently from the original query, since
the product queries need to act like rule-updatable views whereas the
original query has auto-updatable view semantics.
Back-patch to all supported versions.
Reported by Roger Curley (bug #15623). Patch by Amit Langote and me.
Discussion: https://postgr.es/m/15623-
5d67a46788ec8b7f@postgresql.org
Michael Paquier [Wed, 20 Feb 2019 03:32:23 +0000 (12:32 +0900)]
Mark correctly initial slot snapshots with MVCC type when built
When building an initial slot snapshot, snapshots are marked with
historic MVCC snapshots as type with the marker field being set in
SnapBuildBuildSnapshot() but not overriden in SnapBuildExportSnapshot().
Existing callers of SnapBuildBuildSnapshot() do not care about the type
of snapshot used, but extensions calling it actually may, as reported.
Author: Antonin Houska
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/23215.
1527665193@localhost
Backpatch-through: 9.4
Joe Conway [Sun, 17 Feb 2019 18:14:29 +0000 (13:14 -0500)]
Fix documentation for dblink_error_message() return value
The dblink documentation claims that an empty string is returned if there
has been no error, however OK is actually returned in that case. Also,
clarify that an async error may not be seen unless dblink_is_busy() or
dblink_get_result() have been called first.
Backpatch to all supported branches.
Reported-by: realyota
Backpatch-through: 9.4
Discussion: https://postgr.es/m/
153371978486.1298.
2091761143788088262@wrigleys.postgresql.org
Tom Lane [Sun, 17 Feb 2019 17:37:32 +0000 (12:37 -0500)]
Fix CREATE VIEW to allow zero-column views.
We should logically have allowed this case when we allowed zero-column
tables, but it was overlooked.
Although this might be thought a feature addition, it's really a bug
fix, because it was possible to create a zero-column view via
the convert-table-to-view code path, and then you'd have a situation
where dump/reload would fail. Hence, back-patch to all supported
branches.
Arrange the added test cases to provide coverage of the related
pg_dump code paths (since these views will be dumped and reloaded
during the pg_upgrade regression test). I also made them test
the case where pg_dump has to postpone the view rule into post-data,
which disturbingly had no regression coverage before.
Report and patch by Ashutosh Sharma (test case by me)
Discussion: https://postgr.es/m/CAE9k0PkmHdeSaeZt2ujnb_cKucmK3sDDceDzw7+d5UZoNJPYOg@mail.gmail.com
Thomas Munro [Thu, 14 Feb 2019 21:19:11 +0000 (10:19 +1300)]
Fix race in dsm_attach() when handles are reused.
DSM handle values can be reused as soon as the underlying shared memory
object has been destroyed. That means that for a brief moment we
might have two DSM slots with the same handle. While trying to attach,
if we encounter a slot with refcnt == 1, meaning that it is currently
being destroyed, we should continue our search in case the same handle
exists in another slot.
The race manifested as a rare "dsa_area could not attach to segment"
error, and was more likely in 10 and 11 due to the lack of distinct
seed for random() in parallel workers. It was made very unlikely in
in master by commit
197e4af9, and older releases don't usually create
new DSM segments in background workers so it was also unlikely there.
This fixes the root cause of bug report #15585, in which the error
could also sometimes result in a self-deadlock in the error path.
It's not yet clear if further changes are needed to avoid that failure
mode.
Back-patch to 9.4, where dsm.c arrived.
Author: Thomas Munro
Reported-by: Justin Pryzby, Sergei Kornilov
Discussion: https://postgr.es/m/
20190207014719[email protected]
Discussion: https://postgr.es/m/15585-
324ff6a93a18da46@postgresql.org
Alvaro Herrera [Tue, 12 Feb 2019 21:42:37 +0000 (18:42 -0300)]
Relax overly strict assertion
Ever since its birth, ReorderBufferBuildTupleCidHash() has contained an
assertion that a catalog tuple cannot change Cmax after acquiring one. But
that's wrong: if a subtransaction executes DDL that affects that catalog
tuple, and later aborts and another DDL affects the same tuple, it will
change Cmax. Relax the assertion to merely verify that the Cmax remains
valid and monotonically increasing, instead.
Add a test that tickles the relevant code.
Diagnosed by, and initial patch submitted by: Arseny Sher
Co-authored-by: Arseny Sher
Discussion: https://postgr.es/m/874l9p8hyw.fsf@ars-thinkpad
Tom Lane [Tue, 12 Feb 2019 06:12:52 +0000 (01:12 -0500)]
Fix erroneous error reports in snapbuild.c.
It's pretty unhelpful to report the wrong file name in a complaint
about syscall failure, but SnapBuildSerialize managed to do that twice
in a span of 50 lines. Also fix half a dozen missing or poorly-chosen
errcode assignments; that's mostly cosmetic, but still wrong.
Noted while studying recent failures on buildfarm member nightjar.
I'm not sure whether those reports are actually giving the wrong
filename, because there are two places here with identically
spelled error messages. The other one is specifically coded not
to report ENOENT, but if it's this one, how could we be getting
ENOENT from open() with O_CREAT? Need to sit back and await results.
However, these ereports are clearly broken from birth, so back-patch.
Tom Lane [Mon, 11 Feb 2019 21:24:38 +0000 (16:24 -0500)]
Stamp 9.4.21.
Peter Eisentraut [Mon, 11 Feb 2019 13:10:14 +0000 (14:10 +0100)]
Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash:
db7a038785d2919e23e222be1ffa1866087fa803
Tom Lane [Sun, 10 Feb 2019 20:44:05 +0000 (15:44 -0500)]
Release notes for 11.2, 10.7, 9.6.12, 9.5.16, 9.4.21.
Tom Lane [Sun, 10 Feb 2019 00:45:38 +0000 (19:45 -0500)]
Repair unsafe/unportable snprintf usage in pg_restore.
warn_or_exit_horribly() was blithely passing a potentially-NULL
string pointer to a %s format specifier. That works (at least
to the extent of not crashing) on some platforms, but not all,
and since we switched to our own snprintf.c it doesn't work
for us anywhere.
Of the three string fields being handled this way here, I think
that only "owner" is supposed to be nullable ... but considering
that this is error-reporting code, it has very little business
assuming anything, so put in defenses for all three.
Per a crash observed on buildfarm member crake and then
reproduced here. Because of the portability aspect,
back-patch to all supported versions.
Tom Lane [Fri, 8 Feb 2019 18:30:42 +0000 (13:30 -0500)]
Defend against null error message reported by libxml2.
While this isn't really supposed to happen, it can occur in OOM
situations and perhaps others. Instead of crashing, substitute
"(no message provided)".
I didn't worry about localizing this text, since we aren't
localizing anything else here; besides, if we're on the edge of
OOM, it's unlikely gettext() would work.
Report and fix by Sergio Conde Gómez in bug #15624.
Discussion: https://postgr.es/m/15624-
4dea54091a2864e6@postgresql.org
Tom Lane [Fri, 8 Feb 2019 17:49:36 +0000 (12:49 -0500)]
Doc: fix thinko in description of how to escape a backslash in bytea.
Also clean up some discussion that had been left in a very confused
state thanks to half-hearted adjustments for the change to
standard_conforming_strings being the default.
Discussion: https://postgr.es/m/
154954987367.1297.
4358910045409218@wrigleys.postgresql.org
Tom Lane [Thu, 7 Feb 2019 18:10:46 +0000 (13:10 -0500)]
Ensure that foreign scans with lateral refs are planned correctly.
As reported in bug #15613 from Srinivasan S A, file_fdw and postgres_fdw
neglected to mark plain baserel foreign paths as parameterized when the
relation has lateral_relids. Other FDWs have surely copied this mistake,
so rather than just patching those two modules, install a band-aid fix
in create_foreignscan_path to rectify the mistake centrally.
Although the band-aid is enough to fix the visible symptom, correct
the calls in file_fdw and postgres_fdw anyway, so that they are valid
examples for external FDWs.
Also, since the band-aid isn't enough to make this work for parameterized
foreign joins, throw an elog(ERROR) if such a case is passed to
create_foreignscan_path. This shouldn't pose much of a problem for
existing external FDWs, since it's likely they aren't trying to make such
paths anyway (though some of them may need a defense against joins with
lateral_relids, similar to the one this patch installs into postgres_fdw).
Add some assertions in relnode.c to catch future occurrences of the same
error --- in particular, as backstop against core-code mistakes like the
one fixed by commit
bdd9a99aa.
Discussion: https://postgr.es/m/15613-
092be1be9576c728@postgresql.org
Tom Lane [Wed, 6 Feb 2019 17:44:59 +0000 (12:44 -0500)]
Propagate lateral-reference information to indirect descendant relations.
create_lateral_join_info() computes a bunch of information about lateral
references between base relations, and then attempts to propagate those
markings to appendrel children of the original base relations. But the
original coding neglected the possibility of indirect descendants
(grandchildren etc). During v11 development we noticed that this was
wrong for partitioned-table cases, but failed to realize that it was just
as wrong for any appendrel. While the case can't arise for appendrels
derived from traditional table inheritance (because we make a flat
appendrel for that), nested appendrels can arise from nested UNION ALL
subqueries. Failure to mark the lower-level relations as having lateral
references leads to confusion in add_paths_to_append_rel about whether
unparameterized paths can be built. It's not very clear whether that
leads to any user-visible misbehavior; the lack of field reports suggests
that it may cause nothing worse than minor cost misestimation. Still,
it's a bug, and it leads to failures of Asserts that I intend to add
later.
To fix, we need to propagate information from all appendrel parents,
not just those that are RELOPT_BASERELs. We can still do it in one
pass, if we rely on the append_rel_list to be ordered with ancestor
relationships before descendant ones; add assertions checking that.
While fixing this, we can make a small performance improvement by
traversing the append_rel_list just once instead of separately for
each appendrel parent relation.
Noted while investigating bug #15613, though this patch does not fix
that (which is why I'm not committing the related Asserts yet).
Discussion: https://postgr.es/m/3951.
1549403812@sss.pgh.pa.us
Andrew Dunstan [Wed, 6 Feb 2019 12:32:35 +0000 (07:32 -0500)]
Unify searchpath and do file logic in MSVC build scripts.
Commit
f83419b739 failed to notice that mkvcbuild.pl and build.pl use
different searchpath and do-file logic, breaking the latter, so it is
adjusted to use the same logic as mkvcbuild.pl.
Andrew Dunstan [Tue, 5 Feb 2019 23:57:12 +0000 (18:57 -0500)]
Fix included file path for modern perl
Contrary to the comment on
772d4b76, only paths starting with "./" or
"../" are considered relative to the current working directory by perl's
"do" function. So this patch converts all the relevant cases to use "./"
paths. This only affects MSVC.
Backpatch to all live branches.
Andrew Dunstan [Tue, 5 Feb 2019 23:31:10 +0000 (18:31 -0500)]
More fixed for modern perl on back branches
Use "do" instead of "require" for included files, as it doesn't look for
them in the search path but relative to the current working directory.
These changes have already been made to REL_10_STABLE and later, to
satisfy the demands of perlcritic, but need backporting now to earlier
branches.
Andrew Dunstan [Tue, 5 Feb 2019 20:16:55 +0000 (15:16 -0500)]
Keep perl style checker happy
It doesn't like code before "use strict;".
Tom Lane [Tue, 5 Feb 2019 15:58:53 +0000 (10:58 -0500)]
Update time zone data files to tzdata release 2018i.
DST law changes in Kazakhstan, Metlakatla, and São Tomé and Príncipe.
Kazakhstan's Qyzylorda zone is split in two, creating a new zone
Asia/Qostanay, as some areas did not change UTC offset.
Historical corrections for Hong Kong and numerous Pacific islands.
Andrew Dunstan [Tue, 5 Feb 2019 14:59:46 +0000 (09:59 -0500)]
Fix searchpath for modern Perl for genbki.pl
This was fixed for MSVC tools by commit
1df92eeafefac4, but per
buildfarm member bowerbird genbki.pl needs the same treatment.
Backpatch to all live branches.
Tom Lane [Tue, 5 Feb 2019 00:18:50 +0000 (19:18 -0500)]
Doc: in each release branch, keep only that branch's own release notes.
Historically we've had each release branch include all prior branches'
notes, including minor-release changes, back to the beginning of the
project. That's basically an O(N^2) proposition, and it was starting to
catch up with us: as of HEAD the back-branch release notes alone accounted
for nearly 30% of the documentation. While there's certainly some value
in easy access to back-branch notes, this is getting out of hand.
Hence, switch over to the rule that each branch contains only its own
release notes. So as to not make older notes too hard to find, each
branch will provide URLs for the immediately preceding branches'
release notes on the project website.
There might be value in providing aggregated notes across all branches
somewhere on the website, but that's a task for another day.
Discussion: https://postgr.es/m/
cbd4aeb5-2d9c-8b84-e968-
9e09393d4c83@postgresql.org
Tom Lane [Mon, 4 Feb 2019 22:20:02 +0000 (17:20 -0500)]
Fix dumping of matviews with indirect dependencies on primary keys.
Commit
62215de29 turns out to have been not quite on-the-mark.
When we are forced to postpone dumping of a materialized view into
the dump's post-data section (because it depends on a unique index
that isn't created till that section), we may also have to postpone
dumping other matviews that depend on said matview. The previous fix
didn't reliably work for such cases: it'd break the dependency loops
properly, producing a workable object ordering, but it didn't
necessarily mark all the matviews as "postponed_def". This led to
harmless bleating about "archive items not in correct section order",
as reported by Tom Cassidy in bug #15602. Less harmlessly,
selective-restore options such as --section might misbehave due to
the matview dump objects not being properly labeled.
The right way to fix it is to consider that each pre-data dependency
we break amounts to moving the no-longer-dependent object into
post-data, and hence we should mark that object if it's a matview.
Back-patch to all supported versions, since the issue's been there
since matviews were introduced.
Discussion: https://postgr.es/m/15602-
e895445f73dc450b@postgresql.org
Michael Paquier [Sun, 3 Feb 2019 08:49:04 +0000 (17:49 +0900)]
Add PG_CFLAGS, PG_CXXFLAGS, and PG_LDFLAGS variables to PGXS
Add PG_CFLAGS, PG_CXXFLAGS, and PG_LDFLAGS variables to pgxs.mk which
will be appended or prepended to the corresponding make variables.
Notably, there was previously no way to pass custom CXXFLAGS to third
party extension module builds, COPT and PROFILE supporting only CFLAGS
and LDFLAGS.
Backpatch all the way down to ease integration with existing
extensions.
Author: Christoph Berg
Reviewed-by: Andres Freund, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/
20181113104005[email protected]
Backpatch-through: 9.4
Amit Kapila [Sat, 2 Feb 2019 10:13:58 +0000 (15:43 +0530)]
Avoid possible deadlock while locking multiple heap pages.
To avoid deadlock, backend acquires a lock on heap pages in block
number order. In certain cases, lock on heap pages is dropped and
reacquired. In this case, the locks are dropped for reading in
corresponding VM page/s. The issue is we re-acquire locks in bufferId
order whereas the intention was to acquire in blockid order.
This commit ensures that we will always acquire locks on heap pages in
blockid order.
Reported-by: Nishant Fnu
Author: Nishant Fnu
Reviewed-by: Amit Kapila and Robert Haas
Backpatch-through: 9.4
Discussion: https://postgr.es/m/
5883C831-2ED1-47C8-BFAC-
2D5BAE5A8CAE@amazon.com
Michael Paquier [Fri, 1 Feb 2019 01:36:02 +0000 (10:36 +0900)]
Fix use of dangling pointer in heap_delete() when logging replica identity
When logging the replica identity of a deleted tuple, XLOG_HEAP_DELETE
records include references of the old tuple. Its data is stored in an
intermediate variable used to register this information for the WAL
record, but this variable gets away from the stack when the record gets
actually inserted.
Spotted by clang's AddressSanitizer.
Author: Stas Kelvish
Discussion: https://postgr.es/m/
085C8825-AD86-4E93-AF80-
E26CDF03D1EA@postgrespro.ru
Backpatch-through: 9.4
Peter Eisentraut [Mon, 28 Jan 2019 21:09:33 +0000 (22:09 +0100)]
Fix a crash in logical replication
The bug was that determining which columns are part of the replica
identity index using RelationGetIndexAttrBitmap() would run
eval_const_expressions() on index expressions and predicates across
all indexes of the table, which in turn might require a snapshot, but
there wasn't one set, so it crashes. There were actually two separate
bugs, one on the publisher and one on the subscriber.
To trigger the bug, a table that is part of a publication or
subscription needs to have an index with a predicate or expression
that lends itself to constant expressions simplification.
The fix is to avoid the constant expressions simplification in
RelationGetIndexAttrBitmap(), so that it becomes safe to call in these
contexts. The constant expressions simplification comes from the
calls to RelationGetIndexExpressions()/RelationGetIndexPredicate() via
BuildIndexInfo(). But RelationGetIndexAttrBitmap() calling
BuildIndexInfo() is overkill. The latter just takes pg_index catalog
information, packs it into the IndexInfo structure, which former then
just unpacks again and throws away. We can just do this directly with
less overhead and skip the troublesome calls to
eval_const_expressions(). This also removes the awkward
cross-dependency between relcache.c and index.c.
Bug: #15114
Reported-by: Петър Славов <[email protected]>
Reviewed-by: Noah Misch <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/
152110589574.1223.
17983600132321618383@wrigleys.postgresql.org/
Tom Lane [Sat, 26 Jan 2019 02:14:31 +0000 (21:14 -0500)]
Allow UNLISTEN in hot-standby mode.
Since LISTEN is (still) disallowed, UNLISTEN must be a no-op in a
hot-standby session, and so there's no harm in allowing it. This
change allows client code to not worry about whether it's connected
to a primary or standby server when performing session-state-reset
type activities. (Note that DISCARD ALL, which includes UNLISTEN,
was already allowed, making it inconsistent to reject UNLISTEN.)
Per discussion, back-patch to all supported versions.
Shay Rojansky, reviewed by Mi Tar
Discussion: https://postgr.es/m/CADT4RqCf2gA_TJtPAjnGzkC3ZiexfBZiLmA-mV66e4UyuVv8bA@mail.gmail.com
Tom Lane [Thu, 24 Jan 2019 21:46:56 +0000 (16:46 -0500)]
Remove infinite-loop hazards in ecpg test suite.
A report from Andrew Dunstan showed that an ecpglib breakage that
causes repeated query failures could lead to infinite loops in some
ecpg test scripts, because they contain "while(1)" loops with no
exit condition other than successful test completion. That might
be all right for manual testing, but it seems entirely unacceptable
for automated test environments such as our buildfarm. We don't
want buildfarm owners to have to intervene manually when a test
goes wrong.
To fix, just change all those while(1) loops to exit after at most
100 iterations (which is more than any of them expect to iterate).
This seems sufficient since we'd see discrepancies in the test output
if any loop executed the wrong number of times.
I tested this by dint of intentionally breaking ecpg_do_prologue
to always fail, and verifying that the tests still got to completion.
Back-patch to all supported branches, since the whole point of this
exercise is to protect the buildfarm against future mistakes.
Discussion: https://postgr.es/m/18693.
1548302004@sss.pgh.pa.us
Tom Lane [Thu, 24 Jan 2019 03:46:45 +0000 (22:46 -0500)]
Blind attempt to fix _configthreadlocale() failures on MinGW.
Apparently, some builds of MinGW contain a version of
_configthreadlocale() that always returns -1, indicating failure.
Rather than treating that as a curl-up-and-die condition, soldier on
as though the function didn't exist. This leaves us without thread
safety on such MinGW versions, but we didn't have it anyway.
Discussion: https://postgr.es/m/
d06a16bc-52d6-9f0d-2379-
21242d7dbe81@2ndQuadrant.com
Heikki Linnakangas [Wed, 23 Jan 2019 11:39:00 +0000 (13:39 +0200)]
Fix misc typos in comments.
Spotted mostly by Fabien Coelho.
Discussion: https://www.postgresql.org/message-id/alpine.DEB.2.21.
1901230947050.16643@lancre
Tom Lane [Tue, 22 Jan 2019 04:18:58 +0000 (23:18 -0500)]
Avoid thread-safety problem in ecpglib.
ecpglib attempts to force the LC_NUMERIC locale to "C" while reading
server output, to avoid problems with strtod() and related functions.
Historically it's just issued setlocale() calls to do that, but that
has major problems if we're in a threaded application. setlocale()
itself is not required by POSIX to be thread-safe (and indeed is not,
on recent OpenBSD). Moreover, its effects are process-wide, so that
we could cause unexpected results in other threads, or another thread
could change our setting.
On platforms having uselocale(), which is required by POSIX:2008,
we can avoid these problems by using uselocale() instead. Windows
goes its own way as usual, but we can make it safe by using
_configthreadlocale(). Platforms having neither continue to use the
old code, but that should be pretty much nobody among current systems.
(Subsequent buildfarm results show that recent NetBSD versions still
lack uselocale(), but it's not a big problem because they also do not
support non-"C" settings for LC_NUMERIC.)
Back-patch of commits
8eb4a9312 and
ee27584c4.
Michael Meskes and Tom Lane; thanks also to Takayuki Tsunakawa.
Discussion: https://postgr.es/m/31420.
1547783697@sss.pgh.pa.us
Tomas Vondra [Sat, 19 Jan 2019 19:36:07 +0000 (20:36 +0100)]
Revert "Add valgrind suppressions for wcsrtombs optimizations"
This reverts commit
41344896364c4bf2229ec590c95cf23a6bec928e.
Per discussion, it's not desirable to add valgrind suppressions for
outside our own code base (e.g. glibc in this case), especially when
the suppressions may be platform-specific. There are better ways to
deal with that, e.g. by providing local suppressions.
Discussion: https://www.postgresql.org/message-id/flat/
90ac0452-e907-e7a4-b3c8-
15bd33780e62%402ndquadrant.com
Tom Lane [Fri, 18 Jan 2019 20:06:26 +0000 (15:06 -0500)]
Use our own getopt() on OpenBSD.
Recent OpenBSD (at least 5.9 and up) has a version of getopt(3)
that will not cope with the "-:" spec we use to accept double-dash
options in postgres.c and postmaster.c. Admittedly, that's a hack
because POSIX only requires getopt() to allow alphanumeric option
characters. I have no desire to find another way, however, so
let's just do what we were already doing on Solaris: force use
of our own src/port/getopt.c implementation.
In passing, improve some of the comments around said implementation.
Per buildfarm and local testing. Back-patch to all supported branches.
Discussion: https://postgr.es/m/30197.
1547835700@sss.pgh.pa.us
Magnus Hagander [Thu, 17 Jan 2019 12:52:51 +0000 (13:52 +0100)]
Replace references to mailinglists with @lists.postgresql.org
The namespace for all lists have changed a while ago, so all references
should use the correct address.
Magnus Hagander [Thu, 17 Jan 2019 12:47:24 +0000 (13:47 +0100)]
Remove references to Majordomo
Lists are not handled by Majordomo anymore and haven't been for
a while, so remove the reference and instead direct people to the
list server.
Andrew Gierth [Thu, 17 Jan 2019 05:33:01 +0000 (05:33 +0000)]
Postpone aggregate checks until after collation is assigned.
Previously, parseCheckAggregates was run before
assign_query_collations, but this causes problems if any expression
has already had a collation assigned by some transform function (e.g.
transformCaseExpr) before parseCheckAggregates runs. The differing
collations would cause expressions not to be recognized as equal to
the ones in the GROUP BY clause, leading to spurious errors about
unaggregated column references.
The result was that CASE expr WHEN val ... would fail when "expr"
contained a GROUPING() expression or matched one of the group by
expressions, and where collatable types were involved; whereas the
supposedly identical CASE WHEN expr = val ... would succeed.
Backpatch all the way; this appears to have been wrong ever since
collations were introduced.
Per report from Guillaume Lelarge, analysis and patch by me.
Discussion: https://postgr.es/m/CAECtzeVSO_US8C2Khgfv54ZMUOBR4sWq+6_bLrETnWExHT=rFg@mail.gmail.com
Discussion: https://postgr.es/m/
[email protected]
Andrew Dunstan [Sun, 13 Jan 2019 21:43:14 +0000 (16:43 -0500)]
fix typo
Andrew Dunstan [Sun, 13 Jan 2019 20:59:35 +0000 (15:59 -0500)]
Make DLSUFFIX easily discoverable by build scripts
This will enable things like the buildfarm client to discover more
reliably if certain libraries have been installed.
Discussion: https://postgr.es/m/
859e7c91-7ef4-d4b4-2ca2-
8046e0cbee09@2ndQuadrant.com
Backpatch to all live branches.
Tom Lane [Tue, 8 Jan 2019 17:03:54 +0000 (12:03 -0500)]
Doc: update our docs about kernel IPC parameters on *BSD.
runtime.sgml said that you couldn't change SysV IPC parameters on OpenBSD
except by rebuilding the kernel. That's definitely wrong in OpenBSD 6.x,
and excavation in their man pages says it changed in OpenBSD 3.3.
Update NetBSD and OpenBSD sections to recommend adjustment of the SEMMNI
and SEMMNS settings, which are painfully small by default on those
platforms. (The discussion thread contemplated recommending that
people select POSIX semaphores instead, but the performance consequences
of that aren't really clear, so I'll refrain.)
Remove pointless discussion of SEMMNU and SEMMAP from the FreeBSD
section. Minor other wordsmithing.
Discussion: https://postgr.es/m/27582.
1546928073@sss.pgh.pa.us
Andrew Gierth [Mon, 7 Jan 2019 18:19:46 +0000 (18:19 +0000)]
doc: document that INFO messages always go to client.
In passing add a couple of links to the message severity table.
Backpatch because it's always been this way.
Author: Karl O. Pinc <
[email protected]>
Michael Paquier [Sat, 5 Jan 2019 03:49:58 +0000 (12:49 +0900)]
doc: Update RFC URLs
Consistently use the IETF HTML links instead of a random mix of
different sites and formats. This also fixes one broken link for JSON
documentation.
Tom Lane [Thu, 3 Jan 2019 22:00:08 +0000 (17:00 -0500)]
Improve ANALYZE's handling of concurrent-update scenarios.
This patch changes the rule for whether or not a tuple seen by ANALYZE
should be included in its sample.
When we last touched this logic, in commit
51e1445f1, we weren't
thinking very hard about tuples being UPDATEd by a long-running
concurrent transaction. In such a case, we might see the pre-image as
either LIVE or DELETE_IN_PROGRESS depending on timing; and we might see
the post-image not at all, or as INSERT_IN_PROGRESS. Since the existing
code will not sample either DELETE_IN_PROGRESS or INSERT_IN_PROGRESS
tuples, this leads to concurrently-updated rows being omitted from the
sample entirely. That's not very helpful, and it's especially the wrong
thing if the concurrent transaction ends up rolling back.
The right thing seems to be to sample DELETE_IN_PROGRESS rows just as if
they were live. This makes the "sample it" and "count it" decisions the
same, which seems good for consistency. It's clearly the right thing
if the concurrent transaction ends up rolling back; in effect, we are
sampling as though IN_PROGRESS transactions haven't happened yet.
Also, this combination of choices ensures maximum robustness against
the different combinations of whether and in which state we might see the
pre- and post-images of an update.
It's slightly annoying that we end up recording immediately-out-of-date
stats in the case where the transaction does commit, but on the other
hand the stats are fine for columns that didn't change in the update.
And the alternative of sampling INSERT_IN_PROGRESS rows instead seems
like a bad idea, because then the sampling would be inconsistent with
the way rows are counted for the stats report.
Per report from Mark Chambers; thanks to Jeff Janes for diagnosing
what was happening. Back-patch to all supported versions.
Discussion: https://postgr.es/m/CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com
Tom Lane [Wed, 2 Jan 2019 21:33:48 +0000 (16:33 -0500)]
Don't believe MinMaxExpr is leakproof without checking.
MinMaxExpr invokes the btree comparison function for its input datatype,
so it's only leakproof if that function is. Many such functions are
indeed leakproof, but others are not, and we should not just assume that
they are. Hence, adjust contain_leaked_vars to verify the leakproofness
of the referenced function explicitly.
I didn't add a regression test because it would need to depend on
some particular comparison function being leaky, and that's a moving
target, per discussion.
This has been wrong all along, so back-patch to supported branches.
Discussion: https://postgr.es/m/31042.
1546194242@sss.pgh.pa.us
Bruce Momjian [Wed, 2 Jan 2019 17:44:25 +0000 (12:44 -0500)]
Update copyright for 2019
Backpatch-through: certain files through 9.4
Noah Misch [Mon, 31 Dec 2018 21:50:32 +0000 (13:50 -0800)]
pg_regress: Promptly detect failed postmaster startup.
Detect it the way pg_ctl's wait_for_postmaster() does. When pg_regress
spawned a postmaster that failed startup, we were detecting that only
with "pg_regress: postmaster did not respond within 60 seconds".
Back-patch to 9.4 (all supported versions).
Reviewed by Tom Lane.
Discussion: https://postgr.es/m/
20181231172922[email protected]
Alvaro Herrera [Thu, 27 Dec 2018 19:17:40 +0000 (16:17 -0300)]
Have DISCARD ALL/TEMP remove leftover temp tables
Previously, it would only remove temp tables created in the same
session; but if the session uses the BackendId of a previously crashed
backend that left temp tables around, those would not get removed.
Since autovacuum would not drop them either (because it sees that the
BackendId is in use by the current session) these can cause annoying
xid-wraparound warnings.
Apply to branches 9.4 to 10. This is not a problem since version 11,
because commit
943576bddcb5 added state tracking that makes autovacuum
realize that those temp tables are not ours, so it removes them.
This is useful to handle in DISCARD, because even though it does not
handle all situations, it does handle the common one where a connection
pooler keeps the same session open for an indefinitely long time.
Discussion: https://postgr.es/m/
20181226190834[email protected]
Reviewed-by: Takayuki Tsunakawa, Michaël Paquier
Alvaro Herrera [Thu, 27 Dec 2018 19:00:39 +0000 (16:00 -0300)]
Make autovacuum more selective about temp tables to keep
When temp tables are in danger of XID wraparound, autovacuum drops them;
however, it preserves those that are owned by a working session. This
is desirable, except when the session is connected to a different
database (because the temp tables cannot be from that session), so make
it only keep the temp tables only if the backend is in the same database
as the temp tables.
This is not bulletproof: it fails to detect temp tables left by a
session whose backend ID is reused in the same database but the new
session does not use temp tables. Commit
943576bddcb5 fixes that case
too, for branches 11 and up (which is why we don't apply this fix to
those branches), but back-patching that one is not universally agreed
on.
Discussion: https://postgr.es/m/
20181214162843[email protected]
Reviewed-by: Takayuki Tsunakawa, Michaël Paquier
Michael Paquier [Thu, 27 Dec 2018 01:17:42 +0000 (10:17 +0900)]
Ignore inherited temp relations from other sessions when truncating
Inheritance trees can include temporary tables if the parent is
permanent, which makes possible the presence of multiple temporary
children from different sessions. Trying to issue a TRUNCATE on the
parent in this scenario causes a failure, so similarly to any other
queries just ignore such cases, which makes TRUNCATE work
transparently.
This makes truncation behave similarly to any other DML query working on
the parent table with queries which need to be issues on children. A
set of isolation tests is added to cover basic cases.
Reported-by: Zhou Digoal
Author: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/15565-
ce67a48d0244436a@postgresql.org
Backpatch-through: 9.4
Tom Lane [Wed, 26 Dec 2018 20:30:10 +0000 (15:30 -0500)]
Fix portability failure introduced in commits
d2b0b60e7 et al.
I made a frontend fprintf() format use %m, forgetting that that's only
safe in HEAD not the back branches; prior to
96bf88d52 and
d6c55de1f,
it would work on glibc platforms but not elsewhere. Revert to using
%s ... strerror(errno) as the code did before.
We could have left HEAD as-is, but for code consistency across branches,
I chose to apply this patch there too.
Per Coverity and a few buildfarm members.
Peter Eisentraut [Sat, 22 Dec 2018 06:21:40 +0000 (07:21 +0100)]
Fix ancient compiler warnings and typos in !HAVE_SYMLINK code
This has never been correct since this code was introduced.
Tom Lane [Thu, 20 Dec 2018 18:55:11 +0000 (13:55 -0500)]
Doc: fix ancient mistake in search_path documentation.
"$user" in a search_path string is replaced by CURRENT_USER not
SESSION_USER. (It actually was SESSION_USER in the initial implementation,
but we changed it shortly later, and evidently forgot to fix the docs to
match.)
Noted by
[email protected]
Discussion: https://postgr.es/m/
159151fb45d490c8d31ea9707e9ba99d@stdpr.ru
Tom Lane [Tue, 18 Dec 2018 16:19:39 +0000 (11:19 -0500)]
Fix ancient thinko in mergejoin cost estimation.
"rescanratio" was computed as 1 + rescanned-tuples / total-inner-tuples,
which is sensible if it's to be multiplied by total-inner-tuples or a cost
value corresponding to scanning all the inner tuples. But in reality it
was (mostly) multiplied by inner_rows or a related cost, numbers that take
into account the possibility of stopping short of scanning the whole inner
relation thanks to a limited key range in the outer relation. This'd
still make sense if we could expect that stopping short would result in a
proportional decrease in the number of tuples that have to be rescanned.
It does not, however. The argument that establishes the validity of our
estimate for that number is independent of whether we scan all of the inner
relation or stop short, and experimentation also shows that stopping short
doesn't reduce the number of rescanned tuples. So the correct calculation
is 1 + rescanned-tuples / inner_rows, and we should be sure to multiply
that by inner_rows or a corresponding cost value.
Most of the time this doesn't make much difference, but if we have
both a high rescan rate (due to lots of duplicate values) and an outer
key range much smaller than the inner key range, then the error can
be significant, leading to a large underestimate of the cost associated
with rescanning.
Per report from Vijaykumar Jain. This thinko appears to go all the way
back to the introduction of the rescan estimation logic in commit
70fba7043, so back-patch to all supported branches.
Discussion: https://postgr.es/m/CAE7uO5hMb_TZYJcZmLAgO6iD68AkEK6qCe7i=vZUkCpoKns+EQ@mail.gmail.com
Michael Paquier [Tue, 18 Dec 2018 01:03:19 +0000 (10:03 +0900)]
Update project link of pgBadger in documentation
The project has moved to a new place.
Reported-by: Peter Neave
Discussion: https://postgr.es/m/
154474118231.5066.
16352227860913505754@wrigleys.postgresql.org
Michael Paquier [Mon, 17 Dec 2018 03:44:09 +0000 (12:44 +0900)]
Fix use-after-free bug when renaming constraints
This is an oversight from recent commit
b13fd344. While on it, tweak
the previous test with a better name for the renamed primary key.
Detected by buildfarm member prion which forces relation cache release
with -DRELCACHE_FORCE_RELEASE. Back-patch down to 9.4 as the previous
commit.
Michael Paquier [Mon, 17 Dec 2018 01:37:24 +0000 (10:37 +0900)]
Make constraint rename issue relcache invalidation on target relation
When a constraint gets renamed, it may have associated with it a target
relation (for example domain constraints don't have one). Not
invalidating the target relation cache when issuing the renaming can
result in issues with subsequent commands that refer to the old
constraint name using the relation cache, causing various failures. One
pattern spotted was using CREATE TABLE LIKE after a constraint
renaming.
Reported-by: Stuart <[email protected]>
Author: Amit Langote
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/
2047094[email protected]
Tom Lane [Sun, 16 Dec 2018 19:51:48 +0000 (14:51 -0500)]
Make error handling in parallel pg_upgrade less bogus.
reap_child() basically ignored the possibility of either an error in
waitpid() itself or a child process failure on signal. We don't really
need to do more than report and crash hard, but proceeding as though
nothing is wrong is definitely Not Acceptable. The error report for
nonzero child exit status was pretty off-point, as well.
Noted while fooling around with child-process failure detection
logic elsewhere. It's been like this a long time, so back-patch to
all supported branches.
Alexander Korotkov [Thu, 13 Dec 2018 19:32:05 +0000 (22:32 +0300)]
Fix wrong backpatching of ginRedoDeletePage() deadlock fix
19cf52e6cc changes lock order in ginRedoDeletePage(). But did it in a wrong
way due to oversight during backpatching. This commit fixes that.
Reported-by: Bruce Momjian
Discussion: https://postgr.es/m/
20181213153232.GA10664%40momjian.us
Alexander Korotkov [Thu, 13 Dec 2018 03:12:31 +0000 (06:12 +0300)]
Prevent GIN deleted pages from being reclaimed too early
When GIN vacuum deletes a posting tree page, it assumes that no concurrent
searchers can access it, thanks to ginStepRight() locking two pages at once.
However, since 9.4 searches can skip parts of posting trees descending from the
root. That leads to the risk that page is deleted and reclaimed before
concurrent search can access it.
This commit prevents the risk of above by waiting for every transaction, which
might wait to reference this page, to finish. Due to binary compatibility
we can't change GinPageOpaqueData to store corresponding transaction id.
Instead we reuse page header pd_prune_xid field, which is unused in index pages.
Discussion: https://postgr.es/m/
31a702a.14dd.
166c1366ac1.Coremail.chjischj%40163.com
Author: Andrey Borodin, Alexander Korotkov
Reviewed-by: Alexander Korotkov
Backpatch-through: 9.4
Alexander Korotkov [Thu, 13 Dec 2018 03:12:25 +0000 (06:12 +0300)]
Prevent deadlock in ginRedoDeletePage()
On standby ginRedoDeletePage() can work concurrently with read-only queries.
Those queries can traverse posting tree in two ways.
1) Using rightlinks by ginStepRight(), which locks the next page before
unlocking its left sibling.
2) Using downlinks by ginFindLeafPage(), which locks at most one page at time.
Original lock order was: page, parent, left sibling. That lock order can
deadlock with ginStepRight(). In order to prevent deadlock this commit changes
lock order to: left sibling, page, parent. Note, that position of parent in
locking order seems insignificant, because we only lock one page at time while
traversing downlinks.
Reported-by: Chen Huajun
Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin
Discussion: https://postgr.es/m/
31a702a.14dd.
166c1366ac1.Coremail.chjischj%40163.com
Author: Alexander Korotkov
Backpatch-through: 9.4
Tom Lane [Tue, 11 Dec 2018 16:21:36 +0000 (11:21 -0500)]
Doc: improve documentation about ALTER LARGE OBJECT requirements.
Unlike other ALTER ref pages, this one neglected to mention that
ALTER OWNER requires being a member of the new owning role.
Per bug #15546 from Stefan Kadow.
Discussion: https://postgr.es/m/15546-
0558c75fd2025e7c@postgresql.org
Tom Lane [Mon, 10 Dec 2018 16:12:43 +0000 (11:12 -0500)]
Add stack depth checks to key recursive functions in backend/nodes/*.c.
Although copyfuncs.c has a check_stack_depth call in its recursion,
equalfuncs.c, outfuncs.c, and readfuncs.c lacked one. This seems
unwise.
Likewise fix planstate_tree_walker(), in branches where that exists.
Discussion: https://postgr.es/m/30253.
1544286631@sss.pgh.pa.us
Tom Lane [Thu, 6 Dec 2018 20:08:44 +0000 (15:08 -0500)]
Improve our response to invalid format strings, and detect more cases.
Places that are testing for *printf failure ought to include the format
string in their error reports, since bad-format-string is one of the
more likely causes of such failure. This both makes it easier to find
and repair the mistake, and provides at least some useful info to the
user who stumbles across such a problem.
Also, tighten snprintf.c to report EINVAL for an invalid flag or
final character in a format %-spec (including the case where the
%-spec is missing a final character altogether). This seems like
better project policy, and it also allows removing an instruction
or two from the hot code path.
Back-patch the error reporting change in pvsnprintf, since it should be
harmless and may be helpful; but not the snprintf.c change.
Per discussion of bug #15511 from Ertuğrul Kahveci, which reported an
invalid translated format string. These changes don't fix that error,
but they should improve matters next time we make such a mistake.
Discussion: https://postgr.es/m/15511-
1d8b6a0bc874112f@postgresql.org
Tom Lane [Thu, 29 Nov 2018 23:28:11 +0000 (18:28 -0500)]
Document handling of invalid/ambiguous timestamp input near DST boundaries.
The source code comments documented this, but the user-facing docs, not
so much. Add a section to Appendix B that discusses it.
In passing, improve a couple other things in Appendix B --- notably,
a long-obsolete claim that time zone abbreviations are looked up in
a fixed table.
Per bug #15527 from Michael Davidson.
Discussion: https://postgr.es/m/15527-
f1be0b4dc99ebbe7@postgresql.org
Tom Lane [Thu, 29 Nov 2018 20:53:44 +0000 (15:53 -0500)]
Ensure static libraries have correct mod time even if ranlib messes it up.
In at least Apple's version of ranlib, the output file is updated to have
a mod time equal to the max of the timestamps of its components, and that
data only has seconds precision. On a filesystem with sub-second file
timestamp precision --- say, APFS --- this can result in the finished
static library appearing older than its input files, which causes useless
rebuilds and possible outright failures in parallel makes.
We've only seen this reported in the field from people using Apple's
ranlib with a non-Apple make, because Apple's make doesn't know about
sub-second timestamps either so it doesn't decide rebuilds are needed.
But Apple's ranlib presumably shares code with at least some BSDen,
so it's not that unlikely that the same problem could arise elsewhere.
To fix, just "touch" the output file after ranlib finishes.
We seem to need this in only one place. There are other calls of
ranlib in our makefiles, but they are working on intermediate files
whose timestamps are not actually important, or else on an installed
static library for which sub-second timestamp precision is unlikely
to matter either. (Also, so far as I can tell, Apple's ranlib doesn't
mess up the file timestamp in the latter usage anyhow.)
In passing, change "ranlib" to "$(RANLIB)" in one place that was
bypassing the make macro for no good reason.
Per bug #15525 from Jack Kelly (via Alyssa Ross).
Back-patch to all supported branches.
Discussion: https://postgr.es/m/15525-
a30da084f17a1faa@postgresql.org
Michael Paquier [Thu, 29 Nov 2018 00:13:04 +0000 (09:13 +0900)]
Fix handling of synchronous replication for stopping WAL senders
This fixes an oversight from
c6c3334 which has introduced a more strict
ordering in the way WAL senders are stopped to prevent current WAL
activity when a shutdown checkpoint is created. After all backends are
stopped, all WAL senders are requested to stop which makes them stop any
activity, and switching their state as stopping. Once the checkpointer
knows that all WAL senders are in a stopping state, the shutdown
checkpoint can begin, with all WAL senders activated, waiting for their
clients to flush the shutdown checkpoint record.
If a subset of WAL senders are stopping and in a sync state, other WAL
senders could still be waiting for a WAL position to be synced while
committing a transaction, however the subset of stopping senders would
not release waiters, potentially breaking synchronous replication
guarantees. This commit makes sure that even WAL senders stopping are
able to release waiters properly.
On 9.4, this can also trigger an assertion failure when setting for
example max_wal_senders to 1 where a WAL sender is not able to find
itself as in synchronous state when the instance stops.
Reported-by: Paul Guo
Author: Paul Guo, Michael Paquier
Discussion: https://postgr.es/m/CAEET0ZEv8VFqT3C-cQm6byOB4r4VYWcef1J21dOX-gcVhCSpmA@mail.gmail.com
Backpatch-through: 9.4
Tomas Vondra [Wed, 28 Nov 2018 00:11:15 +0000 (01:11 +0100)]
Do not decode TOAST data for table rewrites
During table rewrites (VACUUM FULL and CLUSTER), the main heap is logged
using XLOG / FPI records, and thus (correctly) ignored in decoding.
But the associated TOAST table is WAL-logged as plain INSERT records,
and so was logically decoded and passed to reorder buffer.
That has severe consequences with TOAST tables of non-trivial size.
Firstly, reorder buffer has to keep all those changes, possibly spilling
them to a file, incurring I/O costs and disk space.
Secondly, ReoderBufferCommit() was stashing all those TOAST chunks into
a hash table, which got discarded only after processing the row from the
main heap. But as the main heap is not decoded for rewrites, this never
happened, so all the TOAST data accumulated in memory, resulting either
in excessive memory consumption or OOM.
The fix is simple, as commit
e9edc1ba already introduced infrastructure
(namely HEAP_INSERT_NO_LOGICAL flag) to skip logical decoding of TOAST
tables, but it only applied it to system tables. So simply use it for
all TOAST data in raw_heap_insert().
That would however solve only the memory consumption issue - the TOAST
changes would still be decoded and added to the reorder buffer, and
spilled to disk (although without TOAST tuple data, so much smaller).
But we can solve that by tweaking DecodeInsert() to just ignore such
INSERT records altogether, using XLH_INSERT_CONTAINS_NEW_TUPLE flag,
instead of skipping them later in ReorderBufferCommit().
Review: Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/flat/
1a17c643-e9af-3dba-486b-
fbe31bc1823a%402ndquadrant.com
Backpatch: 9.4-, where logical decoding was introduced
Bruce Momjian [Tue, 27 Nov 2018 00:41:18 +0000 (19:41 -0500)]
doc: fix wording for plpgsql, add "and"
Reported-by: Anthony Greene
Discussion: https://postgr.es/m/CAPRNmnsSZ4QL75FUjcS8ND_oV+WjgyPbZ4ch2RUwmW6PWzF38w@mail.gmail.com
Backpatch-through: 9.4
Tom Lane [Mon, 26 Nov 2018 22:32:51 +0000 (17:32 -0500)]
Fix translation of special characters in psql's LaTeX output modes.
latex_escaped_print() mistranslated \ and failed to provide any translation
for # ^ and ~, all of which would typically lead to LaTeX document syntax
errors. In addition it didn't translate < > and |, which would typically
render as unexpected characters.
To some extent this represents shortcomings in ancient versions of LaTeX,
which if memory serves had no easy way to render these control characters
as ASCII text. But that's been fixed for, um, decades. In any case there
is no value in emitting guaranteed-to-fail output for these characters.
Noted while fooling with test cases added by commit
9a98984f4. Back-patch
the code change to all supported versions.
Michael Paquier [Mon, 26 Nov 2018 07:43:19 +0000 (16:43 +0900)]
Revert "Fix typo in documentation of toast storage"
This reverts commit
058ef3a, per complains from Magnus Hagander and Vik
Fearing.
Michael Paquier [Mon, 26 Nov 2018 06:49:23 +0000 (15:49 +0900)]
Fix typo in documentation of toast storage
Author: Nawaz Ahmed
Discussion: https://postgr.es/m/
154319327168.1315.
1846953598601966513@wrigleys.postgresql.org
Andrew Gierth [Sat, 24 Nov 2018 09:59:49 +0000 (09:59 +0000)]
Fix hstore hash function for empty hstores upgraded from 8.4.
Hstore data generated on pg 8.4 and pg_upgraded to current versions
remains in its original on-disk format unless modified. The same goes
for values generated by the addon hstore-new module on pre-9.0
versions. (The hstoreUpgrade function converts old values on the fly
when read in, but the on-disk value is not modified by this.)
Since old-format empty hstores (and hstore-new hstores) have
representations compatible with the new format, hstoreUpgrade thought
it could get away without modifying such values; but this breaks
hstore_hash (and the new hstore_hash_extended) which assumes
bit-perfect matching between semantically identical hstore values.
Only one bit actually differs (the "new version" flag in the count
field) but that of course is enough to break the hash.
Fix by making hstoreUpgrade unconditionally convert all old values to
new format.
Backpatch all the way, even though this changes a hash value in some
cases, because in those cases the hash value is already failing - for
example, a hash join between old- and new-format empty hstores will be
failing to match, or a hash index on an hstore column containing an
old-format empty value will be failing to find the value since it will
be searching for a hash derived from a new-format datum. (There are no
known field reports of this happening, probably because hashing of
hstores has only been useful in limited circumstances and there
probably isn't much upgraded data being used this way.)
Per concerns arising from discussion of commit
eb6f29141be. Original
bug is my fault.
Discussion: https://postgr.es/m/
60b1fd3b-7332-40f0-7e7f-
f2f04f777747%402ndquadrant.com
Tom Lane [Sat, 24 Nov 2018 18:53:12 +0000 (13:53 -0500)]
Update additional float4/8 expected-output files.
I forgot that the back branches have more variant files than HEAD :-(.
Per buildfarm.
Discussion: https://postgr.es/m/15519-
4fc785b483201ff1@postgresql.org
Tom Lane [Sat, 24 Nov 2018 17:45:50 +0000 (12:45 -0500)]
Fix float-to-integer coercions to handle edge cases correctly.
ftoi4 and its sibling coercion functions did their overflow checks in
a way that looked superficially plausible, but actually depended on an
assumption that the MIN and MAX comparison constants can be represented
exactly in the float4 or float8 domain. That fails in ftoi4, ftoi8,
and dtoi8, resulting in a possibility that values near the MAX limit will
be wrongly converted (to negative values) when they need to be rejected.
Also, because we compared before rounding off the fractional part,
the other three functions threw errors for values that really ought
to get rounded to the min or max integer value.
Fix by doing rint() first (requiring an assumption that it handles
NaN and Inf correctly; but dtoi8 and ftoi8 were assuming that already),
and by comparing to values that should coerce to float exactly, namely
INTxx_MIN and -INTxx_MIN. Also remove some random cosmetic discrepancies
between these six functions.
This back-patches commits
cbdb8b4c0 and
452b637d4. In the 9.4 branch,
also back-patch the portion of
62e2a8dc2 that added PG_INTnn_MIN and
related constants to c.h, so that these functions can rely on them.
Per bug #15519 from Victor Petrovykh.
Patch by me; thanks to Andrew Gierth for analysis and discussion.
Discussion: https://postgr.es/m/15519-
4fc785b483201ff1@postgresql.org
Andrew Gierth [Fri, 23 Nov 2018 23:56:39 +0000 (23:56 +0000)]
Avoid crashes in contrib/intarray gist__int_ops (bug #15518)
1. Integer overflow in internal_size could result in memory corruption
in decompression since a zero-length array would be allocated and then
written to. This leads to crashes or corruption when traversing an
index which has been populated with sufficiently sparse values. Fix by
using int64 for computations and checking for overflow.
2. Integer overflow in g_int_compress could cause pessimal merge
choices, resulting in unnecessarily large ranges (which would in turn
trigger issue 1 above). Fix by using int64 again.
3. Even without overflow, array sizes could become large enough to
cause unexplained memory allocation errors. Fix by capping the sizes
to a safe limit and report actual errors pointing at gist__intbig_ops
as needed.
4. Large inputs to the compression function always consist of large
runs of consecutive integers, and the compression loop was processing
these one at a time in an O(N^2) manner with a lot of overhead. The
expected runtime of this function could easily exceed 6 months for a
single call as a result. Fix by performing a linear-time first pass,
which reduces the worst case to something on the order of seconds.
Backpatch all the way, since this has been wrong forever.
Per bug #15518 from report from irc user "dymk", analysis and patch by
me.
Discussion: https://postgr.es/m/15518-
799e426c3b4f8358@postgresql.org
Bruce Momjian [Wed, 21 Nov 2018 22:20:15 +0000 (17:20 -0500)]
doc: adjust time zone names text, v2
Removed one too many words. Fix for
7906de847f229f391b9e6b5892b4b4a89f29edb4.
Reported-by: Thomas Munro
Backpatch-through: 9.4
Bruce Momjian [Wed, 21 Nov 2018 21:55:39 +0000 (16:55 -0500)]
doc: adjust time zone names text
Reported-by: Kevin <[email protected]>
Discussion: https://postgr.es/m/
154082462281.30897.
14043119084654378035@wrigleys.postgresql.org
Backpatch-through: 9.4
Peter Eisentraut [Tue, 13 Nov 2018 09:42:43 +0000 (10:42 +0100)]
doc: Clarify CREATE TYPE ENUM documentation
The documentation claimed that an enum type requires "one or more"
labels, but since
1fd9883ff49, zero labels are also allowed.
Reported-by: Lukas Eder <[email protected]>
Bug: #15356
Tom Lane [Tue, 20 Nov 2018 01:01:35 +0000 (20:01 -0500)]
Fix old TAP tests' method for selecting a valid PGPORT value.
This code was trying to be paranoid, but it wasn't paranoid enough.
It only ensured that the selected port is in 0..65535, while most
Unix systems will refuse unprivileged attempts to use TCP port numbers
below 1024.
Change it to allow specification of ports 1024..65535, while if the
port is outside that range, map it into 49152..65535 which is the
port range used by our later branches.
The main reason we've not noticed this up to now is that it's not
important when testing over Unix-socket connections, only TCP,
and most of our test code deliberately prevents the postmaster from
opening any TCP ports. However, the SSL tests do open up a TCP port,
and I believe this explains why buildfarm member chipmunk has been
failing the SSL tests in 9.5: it's picking a reserved port number.
Patch in 9.5 and 9.4. Later branches do not use this code.
Tom Lane [Mon, 19 Nov 2018 19:24:52 +0000 (14:24 -0500)]
Back-patch updated thread flags tests into 9.4 and 9.5.
This commit back-patches these 9.6-era commits into 9.4 and 9.5:
e97af6c8b Replace our hacked version of ax_pthread.m4 with latest upstream version.
3b14a17c8 Move pthread-tests earlier in the autoconf script.
01051a987 Use AS_IF rather than plain shell "if" in pthread-check.
a2932283c Update ax_pthread.m4 to an experimental draft version from upstream.
The net result is to sync configure's checks for threading-related
flags and libraries with the version we've been using since 9.6.
The motivation for doing so now is that it seems the older code does
not work correctly on very recent RHEL7/ppc64, as evidenced by
buildfarm member quokka. The newer code is pretty battle-hardened
by now, so this seems like a low-risk fix.
Discussion: https://postgr.es/m/3320.
1542647565@sss.pgh.pa.us
Tom Lane [Mon, 19 Nov 2018 17:01:47 +0000 (12:01 -0500)]
Fix configure's AC_CHECK_DECLS tests to work correctly with clang.
The test case that Autoconf uses to discover whether a function has
been declared doesn't work reliably with clang, because clang reports
a warning not an error if the name is a known built-in function.
On some platforms, this results in a lot of compile-time warnings about
strlcpy and related functions not having been declared.
There is a fix for this (by Noah Misch) in the upstream Autoconf sources,
but since they've not made a release in years and show no indication of
doing so anytime soon, let's just absorb their fix directly. We can
revert this when and if we update to a newer Autoconf release.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/26819.
1542515567@sss.pgh.pa.us
Thomas Munro [Mon, 19 Nov 2018 00:40:57 +0000 (13:40 +1300)]
PANIC on fsync() failure.
On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure. In that case the only remaining copy of the
data is in the WAL. A subsequent fsync() could appear to succeed,
but not have flushed the data. That means that a future checkpoint
could apparently complete successfully but have lost data.
Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure. Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.
Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses. If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail. The GUC defaults
to off, meaning that we panic.
Back-patch to all supported releases.
There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error. A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.
Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/
20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de