Age | Commit message (Collapse) | Author |
|
Per POSIX, a caller of strtol() that wishes to check for errors must
set errno to 0 beforehand. Several places in spell.c neglected that,
so that they risked delivering a false overflow error in case errno
had been ERANGE already. Given the lack of field reports, this case
may be unreachable at present --- but it's surely trouble waiting to
happen, so fix it.
Author: Jacob Brazeal <[email protected]>
Discussion: https://postgr.es/m/CA+COZaBhsq6EromFm+knMJfzK6nTpG23zJ+K2=nfUQQXcj_xcQ@mail.gmail.com
Backpatch-through: 13
|
|
This function is used in both vacuum and analyze code paths, and a
follow-up commit will require distinguishing between the two. This
commit forces callers to specify whether they are in a vacuum or
analyze path, but it does not use that information for anything
yet.
Author: Nathan Bossart <[email protected]>
Co-authored-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
Most were introduced in the 17 timeframe. The ones in wparser_def.c are
very old.
I also changed "JSON path expression for column \"%s\" should return
single item without wrapper" to "JSON path expression for column \"%s\"
must return single item when no wrapper is requested" to avoid
ambiguity.
Backpatch to 17.
Crickets: https://postgr.es/m/[email protected]
|
|
Backpatch-through: 13
|
|
lowerstr() and lowerstr_with_len() in ts_locale.c do the same thing as
str_tolower() that the rest of the system uses, except that the former
don't use the common locale provider framework but instead use the
global libc locale settings.
This patch replaces uses of lowerstr*() with str_tolower(...,
DEFAULT_COLLATION_OID). For instances that use a libc locale
globally, this will result in exactly the same behavior. For
instances that use other locale providers, you now get consistent
behavior and are no longer dependent on the libc locale settings (for
this case; there are others).
Most uses of these functions are for processing dictionary and
configuration files. In those cases, using the default collation
seems appropriate. At least we don't have a more specific collation
available. But the code in contrib/pg_trgm should really depend on
the collation of the columns being processed. This is not done here,
this can be done in a separate patch.
(You can probably construct some edge cases where this change would
create some locale-related upgrade incompatibility, for example if
before you used a combination of ICU and a differently-behaving libc
locale. We can document this in the release notes, but I don't think
there is anything more we can do about this.)
Reviewed-by: Jeff Davis <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
|
|
These do the same thing as the standard isdigit(), isspace(), and
isprint() but with multibyte and encoding support. But all the
callers are only interested in analyzing single-byte ASCII characters.
So this extra layer is overkill and we can replace the uses with the
standard functions.
All the t_is*() functions in ts_locale.c are under scrutiny because
they don't use the common locale provider framework but instead use
the global libc locale settings. For the functions being touched by
this patch, we don't need all that anyway, as mentioned above, so the
simplest solution is to just remove them. The few remaining t_is*()
functions will need a different treatment in a separate patch.
pg_trgm has some compile-time options with macros such as
KEEPONLYALNUM. These are not documented, and the non-default variant
is not supported by any test cases. As part of this undertaking, I'm
removing the non-default variant, as it is in the way of cleanup. So
in this case, the not-KEEPONLYALNUM code path is gone.
Reviewed-by: Jeff Davis <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
|
|
Many of them just seem to have been copied around for no real reason.
Their presence causes (small) risks of hiding actual type mismatches
or silently discarding qualifiers
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
|
|
I started by marking VoidString as const, and fixing the fallout by
marking more fields and function arguments as const. It proliferated
quite a lot, but all within spell.c and spell.h.
A more narrow patch to get rid of the static VoidString buffer would
be to replace it with '#define VoidString ""', as C99 allows assigning
"" to a non-const pointer, even though you're not allowed to modify
it. But it seems like good hygiene to mark all these as const. In the
structs, the pointers can point to the constant VoidString, or a
buffer allocated with palloc(), or with compact_palloc(), so you
should not modify them.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/[email protected]
|
|
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
|
|
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 12
|
|
The initial estimate of the number of distinct ParsedWords is just
that: an estimate. Don't let it exceed what palloc is willing to
allocate. If in fact we need more entries, we'll eventually fail
trying to enlarge the array. But if we don't, this allows success on
inputs that currently draw "invalid memory alloc request size".
Per bug #18080 from Uwe Binder. Back-patch to all supported branches.
Discussion: https://postgr.es/m/[email protected]
|
|
The VacAttrStats structure contained the whole Form_pg_attribute for a
column, but it actually only needs attstattarget from there. So
remove the Form_pg_attribute field and make a separate field for
attstattarget. This simplifies some code for extended statistics that
doesn't deal with a column but an expression, which had to fake up
pg_attribute rows to satisfy internal APIs. Also, we can remove some
comments that essentially said "don't look at pg_attribute directly".
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org
|
|
TParserGet() recurses for some token types, meaning it's possible
to drive it to stack overflow. Since this is a minority behavior,
I chose to add the check_stack_depth() call to the two places that
recurse rather than doing it during every single call.
While at it, add CHECK_FOR_INTERRUPTS(), because this can run
unpleasantly long for long inputs.
Per bug #17995 from Zuming Jiang. This is old, so back-patch
to all supported branches.
Discussion: https://postgr.es/m/[email protected]
|
|
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/[email protected]
|
|
|
|
Now that our regex engine uses palloc(), it's not necessary to set up a
special memory context callback to free compiled regexes. The regex has
no resources other than the memory that is already going to be freed in
bulk.
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
|
|
tsquery's GETQUERY() macro is only safe to apply to a tsquery
that is known non-empty; otherwise it gives a pointer to garbage.
Before commit 5a617d75d, ts_headline() avoided this pitfall, but
only in a very indirect, nonobvious way. (hlCover could not reach
its TS_execute call, because if the query contains no lexemes
then hlFirstIndex would surely return -1.) After that commit,
it fell into the trap, resulting in weird errors such as
"unrecognized operator" and/or valgrind complaints. In HEAD,
fix this by not calling TS_execute_locations() at all for an
empty query. In the back branches, add a defensive check to
hlCover() --- that's not fixing any live bug, but I judge the
code a bit too fragile as-is.
Also, both mark_hl_fragments() and mark_hl_words() were careless
about the possibility of empty search text: in the cases where
no match has been found, they'd end up telling mark_fragment() to
mark from word indexes 0 to 0 inclusive, even when there is no
word 0. This is harmless since we over-allocated the prs->words
array, but it does annoy valgrind. Fix so that the end index is -1
and thus mark_fragment() will do nothing in such cases.
Bottom line is that this fixes a live bug in HEAD, but in the
back branches it's only getting rid of a valgrind nitpick.
Back-patch anyway.
Per report from Alexander Lakhin.
Discussion: https://postgr.es/m/[email protected]
|
|
Check whether the datctype is C to determine whether t_isspace() and
related functions use isspace() or iswspace().
Previously, t_isspace() checked whether the database default collation
was C; which is incorrect when the default collation uses the ICU
provider.
Discussion: https://postgr.es/m/[email protected]
Reviewed-by: Peter Eisentraut
Backpatch-through: 15
|
|
The affected functions are: bsearch, memcmp, memcpy, memset, memmove,
qsort, repalloc
Reviewed-by: Corey Huinker <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
|
|
Some of these appear to be leftovers from when hash_search() took a
char * argument (changed in 5999e78fc45dcb91784b64b6e9ae43f4e4f68ca2).
Since after this there is some more horizontal space available, do
some light reformatting where suitable.
Reviewed-by: Corey Huinker <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
|
|
This patch largely reverts what I did in commits c9b0c678d and
78e73e875. The maximum cover length limit that I added in 78e73e875
(to band-aid over c9b0c678d's performance issues) creates too many
user-visible behavior discrepancies, as complained of for example in
bug #17691. The real problem with hlCover() is not what I thought
at the time, but more that it seems to have been designed with only
AND tsquery semantics in mind. It doesn't work quite right for OR,
and even less so for NOT or phrase queries. However, we can improve
that situation by building a variant of TS_execute() that returns a
list of match locations. We already get an ExecPhraseData struct
representing match locations for the primitive case of a simple match,
as well as one for a phrase match; we just need to add some logic to
combine these for AND and OR operators. The result is a list of
ExecPhraseDatas, which hlCover can regard as having simple AND
semantics, so that its old algorithm works correctly.
There's still a lot not to like about ts_headline's behavior, but
I think the remaining issues have to do with the heuristics used
in mark_hl_words and mark_hl_fragments (which, likewise, were not
revisited when phrase search was added). Improving those is a task
for another day.
Patch by me; thanks to Alvaro Herrera for review.
Discussion: https://postgr.es/m/[email protected]
|
|
This new header contains all the variable-length data types support
(TOAST support) from postgres.h, which isn't needed by large parts of
the backend code.
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/ddcce239-0f29-6e62-4b47-1f8ca742addf%40enterprisedb.com
|
|
Backpatch-through: 11
|
|
This is not really complete, but it catches most cases of practical
interest. The main omissions are:
* regtype, regprocedure, and regoperator parse type names by
calling the main grammar, so any grammar-detected syntax error
will still be a hard error. Also, if one includes a type
modifier in such a type specification, errors detected by the
typmodin function will be hard errors.
* Lookup errors are handled just by passing missing_ok = true
to the relevant catalog lookup function. Because we've used
quite a restrictive definition of "missing_ok", this means that
edge cases such as "the named schema exists, but you lack
USAGE permission on it" are still hard errors.
It would make sense to me to replace most/all missing_ok
parameters with an escontext parameter and then allow these
additional lookup failure cases to be trapped too. But that's
a job for some other day.
Discussion: https://postgr.es/m/[email protected]
|
|
This is slightly tedious because the adjustments cascade through
a couple of levels of subroutines, but it's not very hard.
I chose to avoid changing function signatures more than absolutely
necessary, by passing the escontext pointer in existing structs
where possible.
tsquery's nuisance NOTICEs about empty queries are suppressed in
soft-error mode, since they're not errors and we surely don't want
them to be shown to the user anyway. Maybe that whole behavior
should be reconsidered.
Discussion: https://postgr.es/m/[email protected]
|
|
This shaves some code by replacing the combinations of
CreateTemplateTupleDesc()/TupleDescInitEntry() hardcoding a mapping of
the attributes listed in pg_proc.dat by get_call_result_type() to build
the TupleDesc needed for the rows generated.
get_call_result_type() is more expensive than the former style, but this
removes some duplication with the lists of OUT parameters (pg_proc.dat
and the attributes hardcoded in these code paths). This is applied to
functions that are not considered as critical (aka that could be called
repeatedly for monitoring purposes).
Author: Bharath Rupireddy
Reviewed-by: Robert Haas, Álvaro Herrera, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACV23HW5HP5hFjd89FNS-z5X8r2jNXdMXcpN2BgTtKd87w@mail.gmail.com
|
|
Discussion: https://postgr.es/m/[email protected]
|
|
I just spent an annoying amount of time reverse-engineering the
100%-undocumented API between ts_headline and the text search
parser's prsheadline function. Add some commentary about that
while it's fresh in mind. Also remove some unused macros in
wparser_def.c.
While at it, I noticed that when commit 78e73e875 added a
CHECK_FOR_INTERRUPTS call in TS_execute_recurse, it missed
doing so in the parallel function TS_phrase_execute, which
surely needs one just as much.
Back-patch because of the missing CHECK_FOR_INTERRUPTS.
Might as well back-patch the rest of this too.
|
|
ts_locale.c omitted support for "isalnum" tests, perhaps on the
grounds that there were initially no use-cases for that. However,
both ltree and pg_trgm need such tests, and we do also have one
use-case now in the core backend. The workaround of testing
isalpha and isdigit separately seems quite inefficient, especially
when dealing with multibyte characters; so let's fill in the
missing support.
Discussion: https://postgr.es/m/[email protected]
|
|
In a similar effort to f01592f91, here we mostly rename shadowed local
variables to remove the warnings produced when compiling with
-Wshadow=compatible-local.
This fixes 63 warnings and leaves just 5.
Author: Justin Pryzby, David Rowley
Reviewed-by: Justin Pryzby
Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
|
|
The previous macro implementations just cast the argument to a target
type but did not check whether the input type was appropriate. The
function implementation can do better type checking of the input type.
For the *GetDatumFast() macros, converting to an inline function
doesn't work in the !USE_FLOAT8_BYVAL case, but we can use
AssertVariableIsOfTypeMacro() to get a similar level of type checking.
Reviewed-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
|
|
Autoconf is showing its age, fewer and fewer contributors know how to wrangle
it. Recursive make has a lot of hard to resolve dependency issues and slow
incremental rebuilds. Our home-grown MSVC build system is hard to maintain for
developers not using Windows and runs tests serially. While these and other
issues could individually be addressed with incremental improvements, together
they seem best addressed by moving to a more modern build system.
After evaluating different build system choices, we chose to use meson, to a
good degree based on the adoption by other open source projects.
We decided that it's more realistic to commit a relatively early version of
the new build system and mature it in tree.
This commit adds an initial version of a meson based build system. It supports
building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD,
Solaris and Windows (however only gcc is supported on aix, solaris). For
Windows/MSVC postgres can now be built with ninja (faster, particularly for
incremental builds) and msbuild (supporting the visual studio GUI, but
building slower).
Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM
bitcode generation, documentation adjustments) are done in subsequent commits
requiring further review. Other aspects (e.g. not installing test-only
extensions) are not yet addressed.
When building on Windows with msbuild, builds are slower when using a visual
studio version older than 2019, because those versions do not support
MultiToolTask, required by meson for intra-target parallelism.
The plan is to remove the MSVC specific build system in src/tools/msvc soon
after reaching feature parity. However, we're not planning to remove the
autoconf/make build system in the near future. Likely we're going to keep at
least the parts required for PGXS to keep working around until all supported
versions build with meson.
Some initial help for postgres developers is at
https://wiki.postgresql.org/wiki/Meson
With contributions from Thomas Munro, John Naylor, Stone Tickle and others.
Author: Andres Freund <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Author: Peter Eisentraut <[email protected]>
Reviewed-By: Peter Eisentraut <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|
|
guc.c has grown to be one of our largest .c files, making it
a bottleneck for compilation. It's also acquired a bunch of
knowledge that'd be better kept elsewhere, because of our not
very good habit of putting variable-specific check hooks here.
Hence, split it up along these lines:
* guc.c itself retains just the core GUC housekeeping mechanisms.
* New file guc_funcs.c contains the SET/SHOW interfaces and some
SQL-accessible functions for GUC manipulation.
* New file guc_tables.c contains the data arrays that define the
built-in GUC variables, along with some already-exported constant
tables.
* GUC check/assign/show hook functions are moved to the variable's
home module, whenever that's clearly identifiable. A few hard-
to-classify hooks ended up in commands/variable.c, which was
already a home for miscellaneous GUC hook functions.
To avoid cluttering a lot more header files with #include "guc.h",
I also invented a new header file utils/guc_hooks.h and put all
the GUC hook functions' declarations there, regardless of their
originating module. That allowed removal of #include "guc.h"
from some existing headers. The fallout from that (hopefully
all caught here) demonstrates clearly why such inclusions are
best minimized: there are a lot of files that, for example,
were getting array.h at two or more levels of remove, despite
not having any connection at all to GUCs in themselves.
There is some very minor code beautification here, such as
renaming a couple of inconsistently-named hook functions
and improving some comments. But mostly this just moves
code from point A to point B and deals with the ensuing
needs for #include adjustments and exporting a few functions
that previously weren't exported.
Patch by me, per a suggestion from Andres Freund; thanks also
to Michael Paquier for the idea to invent guc_funcs.c.
Discussion: https://postgr.es/m/[email protected]
|
|
This reverts commit 595836e99bf1ee6d43405b885fb69bb8c6d3ee23.
It has problems when USE_FLOAT8_BYVAL is off.
|
|
The previous macro implementations just cast the argument to a target
type but did not check whether the input type was appropriate. The
function implementation can do better type checking of the input type.
Reviewed-by: Aleksander Alekseev <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
|
|
SplitToVariants() in the ispell code, lseg_inside_poly() in geo_ops.c,
and regex_selectivity_sub() in selectivity estimation could recurse
until stack overflow; fix by adding check_stack_depth() calls.
So could next() in the regex compiler, but that case is better fixed by
converting its tail recursion to a loop. (We probably get better code
that way too, since next() can now be inlined into its sole caller.)
There remains a reachable stack overrun in the Turkish stemmer, but
we'll need some advice from the Snowball people about how to fix that.
Per report from Egor Chindyaskin and Alexander Lakhin. These mistakes
are old, so back-patch to all supported branches.
Richard Guo and Tom Lane
Discussion: https://postgr.es/m/[email protected]
|
|
These should have been included in 421892a19 as these shadowed variable
warnings can also be fixed by adjusting the scope of the shadowed variable
to put the declaration for it in an inner scope.
This is part of the same effort as f01592f91.
By my count, this takes the warning count from 114 down to 106.
Author: David Rowley and Justin Pryzby
Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
|
|
Justin Pryzby reported that some scenarios could cause gathering
of extended statistics to spend many seconds in an un-cancelable
qsort() operation. To fix, invent qsort_interruptible(), which is
just like qsort_arg() except that it will also do CHECK_FOR_INTERRUPTS
every so often. This bloats the backend by a couple of kB, which
seems like a good investment. (We considered just enabling
CHECK_FOR_INTERRUPTS in the existing qsort and qsort_arg functions,
but there are some callers for which that'd demonstrably be unsafe.
Opt-in seems like a better way.)
For now, just apply qsort_interruptible() in statistics collection.
There's probably more places where it could be useful, but we can
always change other call sites as we find problems.
Back-patch to v14. Before that we didn't have extended stats on
expressions, so that the problem was less severe. Also, this patch
depends on the sort_template infrastructure introduced in v14.
Tom Lane and Justin Pryzby
Discussion: https://postgr.es/m/[email protected]
|
|
There were many calls to construct_array() and deconstruct_array() for
built-in types, for example, when dealing with system catalog columns.
These all hardcoded the type attributes necessary to pass to these
functions.
To simplify this a bit, add construct_array_builtin(),
deconstruct_array_builtin() as wrappers that centralize this hardcoded
knowledge. This simplifies many call sites and reduces the amount of
hardcoded stuff that is spread around.
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/2914356f-9e5f-8c59-2995-5997fc48bcba%40enterprisedb.com
|
|
These are useless and distracting. We wouldn't have written the code
with them to begin with, so there's no reason to keep them.
Author: Justin Pryzby <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
|
|
Author: Justin Pryzby
Discussion: https://postgr.es/m/[email protected]
|
|
Backpatch-through: 10
|
|
This is similar to fd0625c, taking care of any remaining code paths that
are worth the cleanup. This also changes some cases using opposite
expression patterns.
Author: Justin Pryzby, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoCdF8dnUvr-BUWWGvA_XhKSoANacBMZb6jKyCk4TYfQ2Q@mail.gmail.com
|
|
All the code paths simplified here were already using a boolean or used
an expression that led to zero or one, making the extra bits
unnecessary.
Author: Justin Pryzby
Reviewed-by: Tom Lane, Michael Paquier, Peter Smith
Discussion: https://postgr.es/m/[email protected]
|
|
Several places were performing a tight loop to determine the first power
of 2 number that's > or >= the required memory. Instead of using a loop
for that, we can use pg_nextpower2_32 or pg_nextpower2_64. When we need a
power of 2 number equal to or greater than a given amount, we just pass
the amount to the nextpower2 function. When we need a power of 2 greater
than the amount, we just pass the amount + 1.
Additionally, in tsearch there were a couple of locations that were
performing a while loop when a simple "if" would have done. In both of
these locations only 1 item is being added, so the loop could only have
ever iterated once. Changing the loop into an if statement makes the code
very slightly more optimal as the condition is checked once rather than
twice.
There are quite a few remaining locations that increase the size of the
buffer in the following form:
while (reqsize >= buflen)
{
buflen *= 2;
buf = repalloc(buf, buflen);
}
These are not touched in this commit. repalloc will error out for sizes
larger than MaxAllocSize. Changing these to use pg_nextpower2_32 would
remove the chance of that error being raised. It's unclear from the code
if the sizes could ever become that large, so err on the side of caution.
Discussion: https://postgr.es/m/CAApHDvp=tns7RL4PH0ZR0M+M-YFLquK7218x=0B_zO+DbOma+w@mail.gmail.com
Reviewed-by: Zhihong Yu
|
|
Author: Justin Pryzby
Discussion: https://postgr.es/m/[email protected]
|
|
The text search cache mechanisms assume that we can clean up
an invalidated dictionary cache entry simply by resetting the
associated long-lived memory context. However, that does not work
for ispell affixes that make use of regular expressions, because
the regex library deals in plain old malloc. Hence, we leaked
compiled regex(es) any time we dropped such a cache entry. That
could quickly add up, since even a fairly trivial regex can use up
tens of kB, and a large one can eat megabytes. Add a memory context
callback to ensure that a regex gets freed when its owning cache
entry is cleared.
Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.
Discussion: https://postgr.es/m/[email protected]
|
|
When to_tsquery() or websearch_to_tsquery() meet a complex morph containing
multiple words residing adjacent position, these words are connected
with OP_AND operator. That leads to surprising results. For instace,
both websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class <-> pg')
produce '( pg & class ) <-> pg' tsquery. This tsquery requires
'pg' and 'class' words to reside on the same position and doesn't match
to to_tsvector('pg_class pg'). It appears to be ridiculous behavior, which
needs to be fixed.
This commit makes to_tsquery() or websearch_to_tsquery() connect words
residing adjacent position with OP_PHRASE. Therefore, now those words are
normally chained with other OP_PHRASE operator. The examples of above now
produces 'pg <-> class <-> pg' tsquery, which matches to
to_tsvector('pg_class pg').
Another effect of this commit is that complex morph word positions now need to
match the tsvector even if there is no surrounding OP_PHRASE. This behavior
change generally looks like an improvement but making this commit not
backpatchable.
Reported-by: Barry Pederson
Bug: #16592
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/CAPpHfdv0EzVhf6CWfB1_TTZqXV_2Sn-jSY3zSd7ePH%3D-%2B1V2DQ%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Tom Lane, Neil Chen
|
|
Backpatch-through: 9.5
|
|
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).
Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes. This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)
Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize. Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.
Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one. This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not. We might as well save a few
cycles by standardizing on "not".
Also improve the documentation for hash_create().
In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.
Discussion: https://postgr.es/m/[email protected]
|