Age | Commit message (Collapse) | Author |
|
This refactoring is a follow-up of the work done in 5a1dfde8334b, that
has switched 2PC file names to use FullTransactionIds when written on
disk. This will help with the integration of a follow-up solution
related to the handling of two-phase files during recovery, to address
older defects while reading these from disk after a crash.
This change is useful in itself as it reduces the need to build the
file names from epoch numbers and TransactionIds, because we can use
directly FullTransactionIds from which the 2PC file names are guessed.
So this avoids a lot of back-and-forth between the FullTransactionIds
retrieved from the file names and how these are passed around in the
internal 2PC logic.
Note that the core of the change is the use of a FullTransactionId
instead of a TransactionId in GlobalTransactionData, that tracks 2PC
file information in shared memory. The change in TwoPhaseCallback makes
this commit unfit for stable branches.
Noah has contributed a good chunk of this patch. I have spent some time
on it as well while working on the issues with two-phase state files and
recovery.
Author: Noah Misch <[email protected]>
Co-Authored-by: Michael Paquier <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/[email protected]
|
|
Also adjust the phrasing in the comments.
Author: Etsuro Fujita <[email protected]>
Author: Heikki Linnakangas <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Reviewed-by: Gurjeet Singh <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Discussion: https://postgr.es/m/CAPmGK17%3DPHSDZ%2B0G6jcj12buyyE1bQQc3sbp1Wxri7tODT-SDw%40mail.gmail.com
Backpatch-through: 15
|
|
Sami complained that there's a discrepancy between n_mod_since_analyze
and n_ins_since_vacuum, as the former only accounts for committed changes
and the latter tracks committed and aborted inserts. Nobody seemed
overly concerned that this would cause any concerning issues. The
repercussions, from what I can tell, are limited to causing an
autovacuum to trigger for inserts sooner than it otherwise might. For
typical ratios of commits to aborts, it's unlikely to ever be noticed.
Fixing things to make it so n_ins_since_vacuum only displays committed
inserts would require an additional field in PgStat_TableCounts, which
does not quite seem worthwhile at this stage. This commit just adds a
comment with some details to mention that we know about it, which will
hopefully prevent repeat discussions.
Reported-by: Sami Imseih <[email protected]>
Author: David Rowley <[email protected]>
Reviewed-by: Sami Imseih <[email protected]>
Discussion: https://postgr.es/m/CAApHDvpgV3a-R2EGmPOh0L-x3pHbZpM3y4dySWfy+UqUazwDQA@mail.gmail.com
|
|
This commit adds four fields to the statistics of relations, aggregating
the amount of time spent for each operation on a relation:
- total_vacuum_time, for manual vacuum.
- total_autovacuum_time, for vacuum done by the autovacuum daemon.
- total_analyze_time, for manual analyze.
- total_autoanalyze_time, for analyze done by the autovacuum daemon.
This gives users the option to derive the average time spent for these
operations with the help of the related "count" fields.
Bump catalog version (for the catalog changes) and PGSTAT_FILE_FORMAT_ID
(for the additions in PgStat_StatTabEntry).
Author: Sami Imseih
Reviewed-by: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/CAA5RZ0uVOGBYmPEeGF2d1B_67tgNjKx_bKDuL+oUftuoz+=Y1g@mail.gmail.com
|
|
9aea73fc61d4 has added support for backend statistics, relying on
PgStat_EntryRef->pending for its data pending for flush. This design
lacks in flexibility, because the pending list does some memory
allocation, making it unsuitable if incrementing counters in critical
sections.
Pending data of backend statistics is reworked so the implementation
does not depend on PgStat_EntryRef->pending anymore, relying on a static
area of memory to store the counters that are flushed when stats are
reported to the pgstats dshash. An advantage of this approach is to
allow the pending data to be manipulated in critical sections; some
patches are under discussion and require that.
The pending data is tracked by PendingBackendStats, local to
pgstat_backend.c. Two routines are introduced to allow IO statistics to
update the backend-side counters. have_static_pending_cb and
flush_static_cb are used for the flush, instead of flush_pending_cb.
Author: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/66efowskppsns35v5u2m7k4sdnl7yoz5bo64tdjwq7r5lhplrz@y7dme5xwh2r5
|
|
This commit changes the way pending backend statistics are tracked by
moving them into a new structure called PgStat_BackendPending, removing
PgStat_BackendPendingIO. PgStat_BackendPending currently only includes
PgStat_PendingIO for the pending I/O stats.
pgstat_flush_backend() is extended with a "flags" argument to control
which parts of the stats of a backend should be flushed.
With this refactoring, it becomes easier to plug into backend statistics
more data. A patch to add information related to WAL in this stats kind
is under discussion.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/Z3zqc4o09dM/[email protected]
|
|
Backpatch-through: 13
|
|
This adds a new variable-numbered statistics kind in pgstats, where the
object ID key of the stats entries is based on the proc number of the
backends. This acts as an upper-bound for the number of stats entries
that can exist at once. The entries are created when a backend starts
after authentication succeeds, and are removed when the backend exits,
making the stats entry exist for as long as their backend is up and
running. These are not written to the pgstats file at shutdown (note
that write_to_file is disabled, as a safety measure).
Currently, these stats include only information about the I/O generated
by a backend, using the same layer as pg_stat_io, except that it is now
possible to know how much activity is happening in each backend rather
than an overall aggregate of all the activity. A function called
pg_stat_get_backend_io() is added to access this data depending on the
PID of a backend. The existing structure could be expanded in the
future to add more information about other statistics related to
backends, depending on requirements or ideas.
Auxiliary processes are not included in this set of statistics. These
are less interesting to have than normal backends as they have dedicated
entries in pg_stat_io, and stats kinds of their own.
This commit includes also pg_stat_reset_backend_stats(), function able
to reset all the stats associated to a single backend.
Bump catalog version and PGSTAT_FILE_FORMAT_ID.
Author: Bertrand Drouvot
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, Michael Paquier, Nazir
Bilal Yavuz
Discussion: https://postgr.es/m/ZtXR+CtkEVVE/[email protected]
|
|
This new function tests if a memory region starting at a given location
for a defined length is made only of zeroes. This unifies in a single
path the all-zero checks that were happening in a couple of places of
the backend code:
- For pgstats entries of relation, checkpointer and bgwriter, where
some "all_zeroes" variables were previously used with memcpy().
- For all-zero buffer pages in PageIsVerifiedExtended().
This new function uses the same forward scan as the check for all-zero
buffer pages, applying it to the three pgstats paths mentioned above.
Author: Bertrand Drouvot
Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Peter Smith
Discussion: https://postgr.es/m/[email protected]
|
|
as determined by IWYU
These are mostly issues that are new since commit dbbca2cf299.
Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org
|
|
Similar to commit 7e735035f20.
Author: Richard Guo <[email protected]>
Reviewed-by: Bharath Rupireddy <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
|
|
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
|
|
Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(),
IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of
new Am*Process() macros that use MyBackendType. For consistency with
the existing Am*Process() macros.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/[email protected]
|
|
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 12
|
|
This commit refactors find_tabstat_entry() so as transaction counters
for inserted, updated and deleted tuples are included in the result
returned. If a shared entry is found for a relation, its result is now
a copy of the PgStat_TableStatus entry retrieved from shared memory.
This idea has been proposed by Andres Freund.
While on it, the following SQL functions, used in system views, are
refactored with macros, in the same spirit as 83a1a1b56645, reducing the
amount of code:
- pg_stat_get_xact_tuples_deleted()
- pg_stat_get_xact_tuples_inserted()
- pg_stat_get_xact_tuples_updated()
There is now only one caller of find_tabstat_entry() in the tree.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/[email protected]
|
|
This commit renames the members of a few pgstat structures related to
functions and relations, by respectively removing their prefix "f_" and
"t_". The statistics for functions and relations and handled in their
own file, and pgstatfuncs.c associates each field in a structure
variable named based on the object type handled, so no information is
lost with this rename.
This will help with some of the refactoring aimed for pgstatfuncs.c, as
this makes more consistent the field names with the SQL functions
retrieving them.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Melanie Plageman
Discussion: https://postgr.es/m/[email protected]
|
|
Add pgstat counter to track row updates that result in the successor
version going to a new heap page, leaving behind an original version
whose t_ctid points to the new version. The current count is shown by
the n_tup_newpage_upd column of each of the pg_stat_*_tables views.
The new n_tup_newpage_upd column complements the existing n_tup_hot_upd
and n_tup_upd columns. Tables that have high n_tup_newpage_upd values
(relative to n_tup_upd) are good candidates for tuning heap fillfactor.
Corey Huinker, with small tweaks by me.
Author: Corey Huinker <[email protected]>
Reviewed-By: Peter Geoghegan <[email protected]>
Reviewed-By: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/CADkLM=ded21M9iZ36hHm-vj2rE2d=zcKpUQMds__Xm2pxLfHKA@mail.gmail.com
|
|
This commit adds the infrastructure for more detailed IO statistics. The calls
to actually count IOs, a system view to access the new statistics,
documentation and tests will be added in subsequent commits, to make review
easier.
While we already had some IO statistics, e.g. in pg_stat_bgwriter and
pg_stat_database, they did not provide sufficient detail to understand what
the main sources of IO are, or whether configuration changes could avoid
IO. E.g., pg_stat_bgwriter.buffers_backend does contain the number of buffers
written out by a backend, but as that includes extending relations (always
done by backends) and writes triggered by the use of buffer access strategies,
it cannot easily be used to tune background writer or checkpointer. Similarly,
pg_stat_database.blks_read cannot easily be used to tune shared_buffers /
compute a cache hit ratio, as the use of buffer access strategies will often
prevent a large fraction of the read blocks to end up in shared_buffers.
The new IO statistics count IO operations (evict, extend, fsync, read, reuse,
and write), and are aggregated for each combination of backend type (backend,
autovacuum worker, bgwriter, etc), target object of the IO (relations, temp
relations) and context of the IO (normal, vacuum, bulkread, bulkwrite).
What is tracked in this series of patches, is sufficient to perform the
aforementioned analyses. Further details, e.g. tracking the number of buffer
hits, would make that even easier, but was left out for now, to keep the scope
of the already large patchset manageable.
Bumps PGSTAT_FILE_FORMAT_ID.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Justin Pryzby <[email protected]>
Reviewed-by: Kyotaro Horiguchi <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|
|
This is in preparation for commiting a larger patch series in the area.
Discussion: https://postgr.es/m/CAAKRu_bHwGEbzNxxy+MQDkrsgog6aO6iUvajJ4d6PD98gFU7+w@mail.gmail.com
|
|
Backpatch-through: 11
|
|
The same code pattern is repeated 21 times for int64 counters (0 for
missing entry) and 5 times for doubles (0 for missing entry) on database
entries. This code is switched to use macros for the basic code
instead, shaving a few hundred lines of originally-duplicated code
patterns. The function names remain the same, but some fields of
PgStat_StatDBEntry have to be renamed to cope with the new style.
This is in the same spirit as 83a1a1b.
Author: Michael Paquier
Reviewed-by: Nathan Bossart, Bertrand Drouvot
Discussion: https://postgr.es/m/[email protected]
|
|
The same code pattern is repeated 17 times for int64 counters (0 for
missing entry) and 5 times for timestamps (NULL for missing entry) on
table entries. This code is switched to use a macro for the basic code
instead, shaving a few hundred lines of originally-duplicated code. The
function names remain the same, but some fields of PgStat_StatTabEntry
have to be renamed to cope with the new style.
Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart
Discussion: https:/postgr.es/m/20221204173207.GA2669116@nathanxps13
|
|
As the list of shared relations is fixed, we can just dispatch based
IsSharedRelation(), instead of first trying to look up stats for a non-shared
rel and falling back to shared stats.
Author: "Drouvot, Bertrand" <[email protected]>
Reviewed-by: Bharath Rupireddy <[email protected]>
Reviewed-by: Andres Freund <[email protected]
Discussion: https://postgr.es/m/[email protected]
|
|
It can be useful to know when a relation has last been used, e.g., when
evaluating whether an index is still required. It was already possible to
infer the time of the last usage by tracking, e.g.,
pg_stat_all_indexes.idx_scan over time. But far from everybody does so.
To make it easier to detect the last time a relation has been scanned, track
that time in each relation's pgstat entry. To minimize overhead a) the
timestamp is updated only when the backend pending stats entry is flushed to
shared stats b) the last transaction's stop timestamp is used as the
timestamp.
Bumps catalog and stats format versions.
Author: Dave Page <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Bruce Momjian <[email protected]>
Reviewed-by: Vik Fearing <[email protected]>
Discussion: https://postgr.es/m/CA+OCxozrVHNFVEPkweUHMZje+t1tfY816d9MZYc6eZwOOusOaQ@mail.gmail.com
|
|
Just after committing 5891c7a8ed8, a test running with debug_discard_caches=1
failed locally...
pgstat_drop_relation() neither checked pgstat_should_count_relation() nor
called pgstat_prep_relation_pending(). With debug_discard_caches=1
rel->pgstat_info wasn't set up, leading pg_stat_get_xact_tuples_inserted()
spuriously still returning > 0 while in the transaction dropping the table.
|
|
Previously the statistics collector received statistics updates via UDP and
shared statistics data by writing them out to temporary files regularly. These
files can reach tens of megabytes and are written out up to twice a
second. This has repeatedly prevented us from adding additional useful
statistics.
Now statistics are stored in shared memory. Statistics for variable-numbered
objects are stored in a dshash hashtable (backed by dynamic shared
memory). Fixed-numbered stats are stored in plain shared memory.
The header for pgstat.c contains an overview of the architecture.
The stats collector is not needed anymore, remove it.
By utilizing the transactional statistics drop infrastructure introduced in a
prior commit statistics entries cannot "leak" anymore. Previously leaked
statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On
systems with many small relations pgstat_vacuum_stat() could be quite
expensive.
Now that replicas drop statistics entries for dropped objects, it is not
necessary anymore to reset stats when starting from a cleanly shut down
replica.
Subsequent commits will perform some further code cleanup, adapt docs and add
tests.
Bumps PGSTAT_FILE_FORMAT_ID.
Author: Kyotaro Horiguchi <[email protected]>
Author: Andres Freund <[email protected]>
Author: Melanie Plageman <[email protected]>
Reviewed-By: Andres Freund <[email protected]>
Reviewed-By: Thomas Munro <[email protected]>
Reviewed-By: Justin Pryzby <[email protected]>
Reviewed-By: "David G. Johnston" <[email protected]>
Reviewed-By: Tomas Vondra <[email protected]> (in a much earlier version)
Reviewed-By: Arthur Zakirov <[email protected]> (in a much earlier version)
Reviewed-By: Antonin Houska <[email protected]> (in a much earlier version)
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/[email protected]
|
|
Most of pgstat uses pgstat_<verb>_<subject>() or just <verb>_<subject>(). But
not all (some introduced fairly recently by me). Rename ones that aren't
intentionally following a different scheme (e.g. AtEOXact_*).
|
|
One problematic part of the current statistics collector design is that there
is no reliable way of getting rid of statistics entries. Because of that
pgstat_vacuum_stat() (called by [auto-]vacuum) matches all stats for the
current database with the catalog contents and tries to drop now-superfluous
entries. That's quite expensive. What's worse, it doesn't work on physical
replicas, despite physical replicas collection statistics entries.
This commit introduces infrastructure to create / drop statistics entries
transactionally, together with the underlying catalog objects (functions,
relations, subscriptions). pgstat_xact.c maintains a list of stats entries
created / dropped transactionally in the current transaction. To ensure the
removal of statistics entries is durable dropped statistics entries are
included in commit / abort (and prepare) records, which also ensures that
stats entries are dropped on standbys.
Statistics entries created separately from creating the underlying catalog
object (e.g. when stats were previously lost due to an immediate restart)
are *not* WAL logged. However that can only happen outside of the transaction
creating the catalog object, so it does not lead to "leaked" statistics
entries.
For this to work, functions creating / dropping functions / relations /
subscriptions need to call into pgstat. For subscriptions this was already
done when dropping subscriptions, via pgstat_report_subscription_drop() (now
renamed to pgstat_drop_subscription()).
This commit does not actually drop stats yet, it just provides the
infrastructure. It is however a largely independent piece of infrastructure,
so committing it separately makes sense.
Bumps XLOG_PAGE_MAGIC.
Author: Andres Freund <[email protected]>
Reviewed-By: Thomas Munro <[email protected]>
Reviewed-By: Kyotaro Horiguchi <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|
|
Until now index_concurrently_swap() directly modified pgstat internal
datastructures. That will break with the introduction of shared memory
statistics and seems off architecturally.
This is done separately from the - quite large - shared memory statistics
patch to make review easier.
Author: Andres Freund <[email protected]>
Author: Kyotaro Horiguchi <[email protected]>
Reviewed-By: Kyotaro Horiguchi <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|
|
Soon the stats collector will be no more, with statistics instead getting
stored in shared memory. There are a lot of references to the stats collector
in comments. This commit replaces most of these references with "cumulative
statistics system", with the remaining ones getting replaced as part of
subsequent commits.
This is done separately from the - quite large - shared memory statistics
patch to make review easier.
Author: Andres Freund <[email protected]>
Reviewed-By: Justin Pryzby <[email protected]>
Reviewed-By: Thomas Munro <[email protected]>
Reviewed-By: Kyotaro Horiguchi <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/[email protected]
|
|
I got the location wrong in 13619598f10. The name did make it sound like it
belonged in pgstat_relation.c...
|
|
There was a wild mishmash of function comment formatting in pgstat, making it
hard to know what to use for any new function and hard to extend existing
comments (particularly due to randomly different forms of indentation).
Author: Andres Freund <[email protected]>
Reviewed-By: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/[email protected]
|
|
pgstat.c is very long, and it's hard to find an order that makes sense and is
likely to be maintained over time. Splitting the different pieces into
separate files makes that a lot easier.
With a few exceptions, this commit just moves code around. Those exceptions
are:
- adding file headers for new files
- removing 'static' from functions
- adapting pgstat_assert_is_up() to work across TUs
- minor comment adjustments
git diff --color-moved=dimmed-zebra is very helpful separating code movement
from code changes.
The next commit in this series will reorder pgstat.[ch] contents to be a bit
more coherent.
Earlier revisions of this patch had "global" statistics (archiver, bgwriter,
checkpointer, replication slots, SLRU, WAL) in one file, because each seemed
small enough. However later commits will increase their size and their
aggregate size is not insubstantial. It also just seems easier to split each
type of statistic into its own file.
Author: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|