summaryrefslogtreecommitdiff
path: root/src/backend/tsearch
AgeCommit message (Collapse)Author
2025-03-08Clear errno before calling strtol() in spell.c.Tom Lane
Per POSIX, a caller of strtol() that wishes to check for errors must set errno to 0 beforehand. Several places in spell.c neglected that, so that they risked delivering a false overflow error in case errno had been ERANGE already. Given the lack of field reports, this case may be unreachable at present --- but it's surely trouble waiting to happen, so fix it. Author: Jacob Brazeal <[email protected]> Discussion: https://postgr.es/m/CA+COZaBhsq6EromFm+knMJfzK6nTpG23zJ+K2=nfUQQXcj_xcQ@mail.gmail.com Backpatch-through: 13
2025-02-11Add is_analyze parameter to vacuum_delay_point().Nathan Bossart
This function is used in both vacuum and analyze code paths, and a follow-up commit will require distinguishing between the two. This commit forces callers to specify whether they are in a vacuum or analyze path, but it does not use that information for anything yet. Author: Nathan Bossart <[email protected]> Co-authored-by: Bertrand Drouvot <[email protected]> Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
2025-01-21Reword recent error messages: "should" -> "must"Álvaro Herrera
Most were introduced in the 17 timeframe. The ones in wparser_def.c are very old. I also changed "JSON path expression for column \"%s\" should return single item without wrapper" to "JSON path expression for column \"%s\" must return single item when no wrapper is requested" to avoid ambiguity. Backpatch to 17. Crickets: https://postgr.es/m/[email protected]
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-12-17Remove ts_locale.c's lowerstr()Peter Eisentraut
lowerstr() and lowerstr_with_len() in ts_locale.c do the same thing as str_tolower() that the rest of the system uses, except that the former don't use the common locale provider framework but instead use the global libc locale settings. This patch replaces uses of lowerstr*() with str_tolower(..., DEFAULT_COLLATION_OID). For instances that use a libc locale globally, this will result in exactly the same behavior. For instances that use other locale providers, you now get consistent behavior and are no longer dependent on the libc locale settings (for this case; there are others). Most uses of these functions are for processing dictionary and configuration files. In those cases, using the default collation seems appropriate. At least we don't have a more specific collation available. But the code in contrib/pg_trgm should really depend on the collation of the columns being processed. This is not done here, this can be done in a separate patch. (You can probably construct some edge cases where this change would create some locale-related upgrade incompatibility, for example if before you used a combination of ICU and a differently-behaving libc locale. We can document this in the release notes, but I don't think there is anything more we can do about this.) Reviewed-by: Jeff Davis <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
2024-12-17Remove ts_locale.c's t_isdigit(), t_isspace(), t_isprint()Peter Eisentraut
These do the same thing as the standard isdigit(), isspace(), and isprint() but with multibyte and encoding support. But all the callers are only interested in analyzing single-byte ASCII characters. So this extra layer is overkill and we can replace the uses with the standard functions. All the t_is*() functions in ts_locale.c are under scrutiny because they don't use the common locale provider framework but instead use the global libc locale settings. For the functions being touched by this patch, we don't need all that anyway, as mentioned above, so the simplest solution is to just remove them. The few remaining t_is*() functions will need a different treatment in a separate patch. pg_trgm has some compile-time options with macros such as KEEPONLYALNUM. These are not documented, and the non-default variant is not supported by any test cases. As part of this undertaking, I'm removing the non-default variant, as it is in the way of cleanup. So in this case, the not-KEEPONLYALNUM code path is gone. Reviewed-by: Jeff Davis <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
2024-11-28Remove useless casts to (void *)Peter Eisentraut
Many of them just seem to have been copied around for no real reason. Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2024-08-06Constify fields and parameters in spell.cHeikki Linnakangas
I started by marking VoidString as const, and fixing the fallout by marking more fields and function arguments as const. It proliferated quite a lot, but all within spell.c and spell.h. A more narrow patch to get rid of the static VoidString buffer would be to replace it with '#define VoidString ""', as C99 allows assigning "" to a non-const pointer, even though you're not allowed to modify it. But it seems like good hygiene to mark all these as const. In the structs, the pointers can point to the constant VoidString, or a buffer allocated with palloc(), or with compact_palloc(), so you should not modify them. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/[email protected]
2024-03-04Remove unused #include's from backend .c filesPeter Eisentraut
as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-01-04Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/[email protected] Backpatch-through: 12
2023-09-25Limit to_tsvector_byid's initial array allocation to something sane.Tom Lane
The initial estimate of the number of distinct ParsedWords is just that: an estimate. Don't let it exceed what palloc is willing to allocate. If in fact we need more entries, we'll eventually fail trying to enlarge the array. But if we don't, this allows success on inputs that currently draw "invalid memory alloc request size". Per bug #18080 from Uwe Binder. Back-patch to all supported branches. Discussion: https://postgr.es/m/[email protected]
2023-07-03Take pg_attribute out of VacAttrStatsPeter Eisentraut
The VacAttrStats structure contained the whole Form_pg_attribute for a column, but it actually only needs attstattarget from there. So remove the Form_pg_attribute field and make a separate field for attstattarget. This simplifies some code for extended statistics that doesn't deal with a column but an expression, which had to fake up pg_attribute rows to satisfy internal APIs. Also, we can remove some comments that essentially said "don't look at pg_attribute directly". Reviewed-by: Tomas Vondra <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org
2023-06-24Check for interrupts and stack overflow in TParserGet().Tom Lane
TParserGet() recurses for some token types, meaning it's possible to drive it to stack overflow. Since this is a minority behavior, I chose to add the check_stack_depth() call to the two places that recurse rather than doing it during every single call. While at it, add CHECK_FOR_INTERRUPTS(), because this can run unpleasantly long for long inputs. Per bug #17995 from Zuming Jiang. This is old, so back-patch to all supported branches. Discussion: https://postgr.es/m/[email protected]
2023-05-19Pre-beta mechanical code beautification.Tom Lane
Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/[email protected]
2023-05-19Remove stray mid-sentence tabs in commentsPeter Eisentraut
2023-04-08Update tsearch regex memory management.Thomas Munro
Now that our regex engine uses palloc(), it's not necessary to set up a special memory context callback to free compiled regexes. The regex has no resources other than the memory that is already going to be freed in bulk. Reviewed-by: Tom Lane <[email protected]> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
2023-04-06Fix ts_headline() edge cases for empty query and empty search text.Tom Lane
tsquery's GETQUERY() macro is only safe to apply to a tsquery that is known non-empty; otherwise it gives a pointer to garbage. Before commit 5a617d75d, ts_headline() avoided this pitfall, but only in a very indirect, nonobvious way. (hlCover could not reach its TS_execute call, because if the query contains no lexemes then hlFirstIndex would surely return -1.) After that commit, it fell into the trap, resulting in weird errors such as "unrecognized operator" and/or valgrind complaints. In HEAD, fix this by not calling TS_execute_locations() at all for an empty query. In the back branches, add a defensive check to hlCover() --- that's not fixing any live bug, but I judge the code a bit too fragile as-is. Also, both mark_hl_fragments() and mark_hl_words() were careless about the possibility of empty search text: in the cases where no match has been found, they'd end up telling mark_fragment() to mark from word indexes 0 to 0 inclusive, even when there is no word 0. This is harmless since we over-allocated the prs->words array, but it does annoy valgrind. Fix so that the end index is -1 and thus mark_fragment() will do nothing in such cases. Bottom line is that this fixes a live bug in HEAD, but in the back branches it's only getting rid of a valgrind nitpick. Back-patch anyway. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/[email protected]
2023-03-17Fix t_isspace(), etc., when datlocprovider=i and datctype=C.Jeff Davis
Check whether the datctype is C to determine whether t_isspace() and related functions use isspace() or iswspace(). Previously, t_isspace() checked whether the database default collation was C; which is incorrect when the default collation uses the ICU provider. Discussion: https://postgr.es/m/[email protected] Reviewed-by: Peter Eisentraut Backpatch-through: 15
2023-02-07Remove useless casts to (void *) in arguments of some system functionsPeter Eisentraut
The affected functions are: bsearch, memcmp, memcpy, memset, memmove, qsort, repalloc Reviewed-by: Corey Huinker <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
2023-02-06Remove useless casts to (void *) in hash_search() callsPeter Eisentraut
Some of these appear to be leftovers from when hash_search() took a char * argument (changed in 5999e78fc45dcb91784b64b6e9ae43f4e4f68ca2). Since after this there is some more horizontal space available, do some light reformatting where suitable. Reviewed-by: Corey Huinker <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
2023-01-19Fix ts_headline() to handle ORs and phrase queries more honestly.Tom Lane
This patch largely reverts what I did in commits c9b0c678d and 78e73e875. The maximum cover length limit that I added in 78e73e875 (to band-aid over c9b0c678d's performance issues) creates too many user-visible behavior discrepancies, as complained of for example in bug #17691. The real problem with hlCover() is not what I thought at the time, but more that it seems to have been designed with only AND tsquery semantics in mind. It doesn't work quite right for OR, and even less so for NOT or phrase queries. However, we can improve that situation by building a variant of TS_execute() that returns a list of match locations. We already get an ExecPhraseData struct representing match locations for the primitive case of a simple match, as well as one for a phrase match; we just need to add some logic to combine these for AND and OR operators. The result is a list of ExecPhraseDatas, which hlCover can regard as having simple AND semantics, so that its old algorithm works correctly. There's still a lot not to like about ts_headline's behavior, but I think the remaining issues have to do with the heuristics used in mark_hl_words and mark_hl_fragments (which, likewise, were not revisited when phrase search was added). Improving those is a task for another day. Patch by me; thanks to Alvaro Herrera for review. Discussion: https://postgr.es/m/[email protected]
2023-01-10New header varatt.h split off from postgres.hPeter Eisentraut
This new header contains all the variable-length data types support (TOAST support) from postgres.h, which isn't needed by large parts of the backend code. Reviewed-by: Tom Lane <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/ddcce239-0f29-6e62-4b47-1f8ca742addf%40enterprisedb.com
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-12-27Convert the reg* input functions to report (most) errors softly.Tom Lane
This is not really complete, but it catches most cases of practical interest. The main omissions are: * regtype, regprocedure, and regoperator parse type names by calling the main grammar, so any grammar-detected syntax error will still be a hard error. Also, if one includes a type modifier in such a type specification, errors detected by the typmodin function will be hard errors. * Lookup errors are handled just by passing missing_ok = true to the relevant catalog lookup function. Because we've used quite a restrictive definition of "missing_ok", this means that edge cases such as "the named schema exists, but you lack USAGE permission on it" are still hard errors. It would make sense to me to replace most/all missing_ok parameters with an escontext parameter and then allow these additional lookup failure cases to be trapped too. But that's a job for some other day. Discussion: https://postgr.es/m/[email protected]
2022-12-27Convert tsqueryin and tsvectorin to report errors softly.Tom Lane
This is slightly tedious because the adjustments cascade through a couple of levels of subroutines, but it's not very hard. I chose to avoid changing function signatures more than absolutely necessary, by passing the escontext pointer in existing structs where possible. tsquery's nuisance NOTICEs about empty queries are suppressed in soft-error mode, since they're not errors and we surely don't want them to be shown to the user anyway. Maybe that whole behavior should be reconsidered. Discussion: https://postgr.es/m/[email protected]
2022-12-21Switch some system functions to use get_call_result_type()Michael Paquier
This shaves some code by replacing the combinations of CreateTemplateTupleDesc()/TupleDescInitEntry() hardcoding a mapping of the attributes listed in pg_proc.dat by get_call_result_type() to build the TupleDesc needed for the rows generated. get_call_result_type() is more expensive than the former style, but this removes some duplication with the lists of OUT parameters (pg_proc.dat and the attributes hardcoded in these code paths). This is applied to functions that are not considered as critical (aka that could be called repeatedly for monitoring purposes). Author: Bharath Rupireddy Reviewed-by: Robert Haas, Álvaro Herrera, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CALj2ACV23HW5HP5hFjd89FNS-z5X8r2jNXdMXcpN2BgTtKd87w@mail.gmail.com
2022-12-20Add copyright notices to meson filesAndrew Dunstan
Discussion: https://postgr.es/m/[email protected]
2022-11-21Add comments and a missing CHECK_FOR_INTERRUPTS in ts_headline.Tom Lane
I just spent an annoying amount of time reverse-engineering the 100%-undocumented API between ts_headline and the text search parser's prsheadline function. Add some commentary about that while it's fresh in mind. Also remove some unused macros in wparser_def.c. While at it, I noticed that when commit 78e73e875 added a CHECK_FOR_INTERRUPTS call in TS_execute_recurse, it missed doing so in the parallel function TS_phrase_execute, which surely needs one just as much. Back-patch because of the missing CHECK_FOR_INTERRUPTS. Might as well back-patch the rest of this too.
2022-10-06Introduce t_isalnum() to replace t_isalpha() || t_isdigit() tests.Tom Lane
ts_locale.c omitted support for "isalnum" tests, perhaps on the grounds that there were initially no use-cases for that. However, both ltree and pg_trgm need such tests, and we do also have one use-case now in the core backend. The workaround of testing isalpha and isdigit separately seems quite inefficient, especially when dealing with multibyte characters; so let's fill in the missing support. Discussion: https://postgr.es/m/[email protected]
2022-10-05Rename shadowed local variablesDavid Rowley
In a similar effort to f01592f91, here we mostly rename shadowed local variables to remove the warnings produced when compiling with -Wshadow=compatible-local. This fixes 63 warnings and leaves just 5. Author: Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
2022-09-27Convert *GetDatum() and DatumGet*() macros to inline functionsPeter Eisentraut
The previous macro implementations just cast the argument to a target type but did not check whether the input type was appropriate. The function implementation can do better type checking of the input type. For the *GetDatumFast() macros, converting to an inline function doesn't work in the !USE_FLOAT8_BYVAL case, but we can use AssertVariableIsOfTypeMacro() to get a similar level of type checking. Reviewed-by: Aleksander Alekseev <[email protected]> Reviewed-by: Tom Lane <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
2022-09-22meson: Add initial version of meson based build systemAndres Freund
Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <[email protected]> Author: Nazir Bilal Yavuz <[email protected]> Author: Peter Eisentraut <[email protected]> Reviewed-By: Peter Eisentraut <[email protected]> Discussion: https://postgr.es/m/[email protected]
2022-09-13Split up guc.c for better build speed and ease of maintenance.Tom Lane
guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/[email protected]
2022-09-12Revert "Convert *GetDatum() and DatumGet*() macros to inline functions"Peter Eisentraut
This reverts commit 595836e99bf1ee6d43405b885fb69bb8c6d3ee23. It has problems when USE_FLOAT8_BYVAL is off.
2022-09-12Convert *GetDatum() and DatumGet*() macros to inline functionsPeter Eisentraut
The previous macro implementations just cast the argument to a target type but did not check whether the input type was appropriate. The function implementation can do better type checking of the input type. Reviewed-by: Aleksander Alekseev <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
2022-08-24Defend against stack overrun in a few more places.Tom Lane
SplitToVariants() in the ispell code, lseg_inside_poly() in geo_ops.c, and regex_selectivity_sub() in selectivity estimation could recurse until stack overflow; fix by adding check_stack_depth() calls. So could next() in the regex compiler, but that case is better fixed by converting its tail recursion to a loop. (We probably get better code that way too, since next() can now be inlined into its sole caller.) There remains a reachable stack overrun in the Turkish stemmer, but we'll need some advice from the Snowball people about how to fix that. Per report from Egor Chindyaskin and Alexander Lakhin. These mistakes are old, so back-patch to all supported branches. Richard Guo and Tom Lane Discussion: https://postgr.es/m/[email protected]
2022-08-24Further -Wshadow=compatible-local warning fixesDavid Rowley
These should have been included in 421892a19 as these shadowed variable warnings can also be fixed by adjusting the scope of the shadowed variable to put the declaration for it in an inner scope. This is part of the same effort as f01592f91. By my count, this takes the warning count from 114 down to 106. Author: David Rowley and Justin Pryzby Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
2022-07-12Invent qsort_interruptible().Tom Lane
Justin Pryzby reported that some scenarios could cause gathering of extended statistics to spend many seconds in an un-cancelable qsort() operation. To fix, invent qsort_interruptible(), which is just like qsort_arg() except that it will also do CHECK_FOR_INTERRUPTS every so often. This bloats the backend by a couple of kB, which seems like a good investment. (We considered just enabling CHECK_FOR_INTERRUPTS in the existing qsort and qsort_arg functions, but there are some callers for which that'd demonstrably be unsafe. Opt-in seems like a better way.) For now, just apply qsort_interruptible() in statistics collection. There's probably more places where it could be useful, but we can always change other call sites as we find problems. Back-patch to v14. Before that we didn't have extended stats on expressions, so that the problem was less severe. Also, this patch depends on the sort_template infrastructure introduced in v14. Tom Lane and Justin Pryzby Discussion: https://postgr.es/m/[email protected]
2022-07-01Add construct_array_builtin, deconstruct_array_builtinPeter Eisentraut
There were many calls to construct_array() and deconstruct_array() for built-in types, for example, when dealing with system catalog columns. These all hardcoded the type attributes necessary to pass to these functions. To simplify this a bit, add construct_array_builtin(), deconstruct_array_builtin() as wrappers that centralize this hardcoded knowledge. This simplifies many call sites and reduces the amount of hardcoded stuff that is spread around. Reviewed-by: Tom Lane <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/2914356f-9e5f-8c59-2995-5997fc48bcba%40enterprisedb.com
2022-04-13Remove extraneous blank lines before block-closing bracesAlvaro Herrera
These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <[email protected]> Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
2022-04-11Fix various typos and spelling mistakes in code commentsDavid Rowley
Author: Justin Pryzby Discussion: https://postgr.es/m/[email protected]
2022-01-08Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-10-11Clean up more code using "(expr) ? true : false"Michael Paquier
This is similar to fd0625c, taking care of any remaining code paths that are worth the cleanup. This also changes some cases using opposite expression patterns. Author: Justin Pryzby, Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoCdF8dnUvr-BUWWGvA_XhKSoANacBMZb6jKyCk4TYfQ2Q@mail.gmail.com
2021-09-08Clean up some code using "(expr) ? true : false"Michael Paquier
All the code paths simplified here were already using a boolean or used an expression that led to zero or one, making the extra bits unnecessary. Author: Justin Pryzby Reviewed-by: Tom Lane, Michael Paquier, Peter Smith Discussion: https://postgr.es/m/[email protected]
2021-07-01Improve various places that double the size of a bufferDavid Rowley
Several places were performing a tight loop to determine the first power of 2 number that's > or >= the required memory. Instead of using a loop for that, we can use pg_nextpower2_32 or pg_nextpower2_64. When we need a power of 2 number equal to or greater than a given amount, we just pass the amount to the nextpower2 function. When we need a power of 2 greater than the amount, we just pass the amount + 1. Additionally, in tsearch there were a couple of locations that were performing a while loop when a simple "if" would have done. In both of these locations only 1 item is being added, so the loop could only have ever iterated once. Changing the loop into an if statement makes the code very slightly more optimal as the condition is checked once rather than twice. There are quite a few remaining locations that increase the size of the buffer in the following form: while (reqsize >= buflen) { buflen *= 2; buf = repalloc(buf, buflen); } These are not touched in this commit. repalloc will error out for sizes larger than MaxAllocSize. Changing these to use pg_nextpower2_32 would remove the chance of that error being raised. It's unclear from the code if the sizes could ever become that large, so err on the side of caution. Discussion: https://postgr.es/m/CAApHDvp=tns7RL4PH0ZR0M+M-YFLquK7218x=0B_zO+DbOma+w@mail.gmail.com Reviewed-by: Zhihong Yu
2021-04-19Fix typos and grammar in comments and docsMichael Paquier
Author: Justin Pryzby Discussion: https://postgr.es/m/[email protected]
2021-03-19Don't leak compiled regex(es) when an ispell cache entry is dropped.Tom Lane
The text search cache mechanisms assume that we can clean up an invalidated dictionary cache entry simply by resetting the associated long-lived memory context. However, that does not work for ispell affixes that make use of regular expressions, because the regex library deals in plain old malloc. Hence, we leaked compiled regex(es) any time we dropped such a cache entry. That could quickly add up, since even a fairly trivial regex can use up tens of kB, and a large one can eat megabytes. Add a memory context callback to ensure that a regex gets freed when its owning cache entry is cleared. Found via valgrind testing. This problem is ancient, so back-patch to all supported branches. Discussion: https://postgr.es/m/[email protected]
2021-01-31Fix parsing of complex morphs to tsqueryAlexander Korotkov
When to_tsquery() or websearch_to_tsquery() meet a complex morph containing multiple words residing adjacent position, these words are connected with OP_AND operator. That leads to surprising results. For instace, both websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class <-> pg') produce '( pg & class ) <-> pg' tsquery. This tsquery requires 'pg' and 'class' words to reside on the same position and doesn't match to to_tsvector('pg_class pg'). It appears to be ridiculous behavior, which needs to be fixed. This commit makes to_tsquery() or websearch_to_tsquery() connect words residing adjacent position with OP_PHRASE. Therefore, now those words are normally chained with other OP_PHRASE operator. The examples of above now produces 'pg <-> class <-> pg' tsquery, which matches to to_tsvector('pg_class pg'). Another effect of this commit is that complex morph word positions now need to match the tsvector even if there is no surrounding OP_PHRASE. This behavior change generally looks like an improvement but making this commit not backpatchable. Reported-by: Barry Pederson Bug: #16592 Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/CAPpHfdv0EzVhf6CWfB1_TTZqXV_2Sn-jSY3zSd7ePH%3D-%2B1V2DQ%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Tom Lane, Neil Chen
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-12-15Improve hash_create()'s API for some added robustness.Tom Lane
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which was formerly the default; and add assertions insisting that exactly one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set. This is in hopes of preventing recurrences of the type of oversight fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS). Also, when HASH_STRINGS is specified, insist that the keysize be more than 8 bytes. This is a heuristic, but it should catch accidental use of HASH_STRINGS for integer or pointer keys. (Nearly all existing use-cases set the keysize to NAMEDATALEN or more, so there's little reason to think this restriction should be problematic.) Tweak hash_create() to insist that the HASH_ELEM flag be set, and remove the defaults it had for keysize and entrysize. Since those defaults were undocumented and basically useless, no callers omitted HASH_ELEM anyway. Also, remove memset's zeroing the HASHCTL parameter struct from those callers that had one. This has never been really necessary, and while it wasn't a bad coding convention it was confusing that some callers did it and some did not. We might as well save a few cycles by standardizing on "not". Also improve the documentation for hash_create(). In passing, improve reinit.c's usage of a hash table by storing the key as a binary Oid rather than a string; and, since that's a temporary hash table, allocate it in CurrentMemoryContext for neatness. Discussion: https://postgr.es/m/[email protected]