diff options
author | Andres Freund | 2019-03-11 19:46:41 +0000 |
---|---|---|
committer | Andres Freund | 2019-03-11 19:46:41 +0000 |
commit | c2fe139c201c48f1133e9fbea2dd99b8efe2fadd (patch) | |
tree | ab0a6261b412b8284b6c91af158f72af97e02a35 /src/include/access | |
parent | a47841528107921f02c280e0c5f91c5a1d86adb0 (diff) |
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/[email protected]
https://postgr.es/m/[email protected]
Diffstat (limited to 'src/include/access')
-rw-r--r-- | src/include/access/genam.h | 6 | ||||
-rw-r--r-- | src/include/access/heapam.h | 92 | ||||
-rw-r--r-- | src/include/access/relscan.h | 122 | ||||
-rw-r--r-- | src/include/access/tableam.h | 468 |
4 files changed, 601 insertions, 87 deletions
diff --git a/src/include/access/genam.h b/src/include/access/genam.h index c4aba39496f..cad66513f62 100644 --- a/src/include/access/genam.h +++ b/src/include/access/genam.h @@ -159,8 +159,10 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel, ParallelIndexScanDesc pscan); extern ItemPointer index_getnext_tid(IndexScanDesc scan, ScanDirection direction); -extern HeapTuple index_fetch_heap(IndexScanDesc scan); -extern HeapTuple index_getnext(IndexScanDesc scan, ScanDirection direction); +struct TupleTableSlot; +extern bool index_fetch_heap(IndexScanDesc scan, struct TupleTableSlot *slot); +extern bool index_getnext_slot(IndexScanDesc scan, ScanDirection direction, + struct TupleTableSlot *slot); extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap); extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info, diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index ab0879138f0..1b6607fe902 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -15,6 +15,7 @@ #define HEAPAM_H #include "access/relation.h" /* for backward compatibility */ +#include "access/relscan.h" #include "access/sdir.h" #include "access/skey.h" #include "access/table.h" /* for backward compatibility */ @@ -60,6 +61,48 @@ typedef struct HeapUpdateFailureData CommandId cmax; } HeapUpdateFailureData; +/* + * Descriptor for heap table scans. + */ +typedef struct HeapScanDescData +{ + TableScanDescData rs_base; /* AM independent part of the descriptor */ + + /* state set up at initscan time */ + BlockNumber rs_nblocks; /* total number of blocks in rel */ + BlockNumber rs_startblock; /* block # to start at */ + BlockNumber rs_numblocks; /* max number of blocks to scan */ + /* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */ + + /* scan current state */ + bool rs_inited; /* false = scan not init'd yet */ + BlockNumber rs_cblock; /* current block # in scan, if any */ + Buffer rs_cbuf; /* current buffer in scan, if any */ + /* NB: if rs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ + + /* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */ + BufferAccessStrategy rs_strategy; /* access strategy for reads */ + + HeapTupleData rs_ctup; /* current tuple in scan, if any */ + + /* these fields only used in page-at-a-time mode and for bitmap scans */ + int rs_cindex; /* current tuple's index in vistuples */ + int rs_ntuples; /* number of visible tuples on page */ + OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */ +} HeapScanDescData; +typedef struct HeapScanDescData *HeapScanDesc; + +/* + * Descriptor for fetches from heap via an index. + */ +typedef struct IndexFetchHeapData +{ + IndexFetchTableData xs_base; /* AM independent part of the descriptor */ + + Buffer xs_cbuf; /* current heap buffer in scan, if any */ + /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ +} IndexFetchHeapData; + /* Result codes for HeapTupleSatisfiesVacuum */ typedef enum { @@ -79,42 +122,32 @@ typedef enum */ -/* struct definitions appear in relscan.h */ -typedef struct HeapScanDescData *HeapScanDesc; -typedef struct ParallelHeapScanDescData *ParallelHeapScanDesc; - /* * HeapScanIsValid * True iff the heap scan is valid. */ #define HeapScanIsValid(scan) PointerIsValid(scan) -extern HeapScanDesc heap_beginscan(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key); -extern HeapScanDesc heap_beginscan_catalog(Relation relation, int nkeys, - ScanKey key); -extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key, - bool allow_strat, bool allow_sync); -extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot, - int nkeys, ScanKey key); -extern HeapScanDesc heap_beginscan_sampling(Relation relation, - Snapshot snapshot, int nkeys, ScanKey key, - bool allow_strat, bool allow_sync, bool allow_pagemode); -extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, +extern TableScanDesc heap_beginscan(Relation relation, Snapshot snapshot, + int nkeys, ScanKey key, + ParallelTableScanDesc parallel_scan, + bool allow_strat, + bool allow_sync, + bool allow_pagemode, + bool is_bitmapscan, + bool is_samplescan, + bool temp_snap); +extern void heap_setscanlimits(TableScanDesc scan, BlockNumber startBlk, BlockNumber endBlk); -extern void heapgetpage(HeapScanDesc scan, BlockNumber page); -extern void heap_rescan(HeapScanDesc scan, ScanKey key); -extern void heap_rescan_set_params(HeapScanDesc scan, ScanKey key, +extern void heapgetpage(TableScanDesc scan, BlockNumber page); +extern void heap_rescan(TableScanDesc scan, ScanKey key, bool set_params, + bool allow_strat, bool allow_sync, bool allow_pagemode); +extern void heap_rescan_set_params(TableScanDesc scan, ScanKey key, bool allow_strat, bool allow_sync, bool allow_pagemode); -extern void heap_endscan(HeapScanDesc scan); -extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction); - -extern Size heap_parallelscan_estimate(Snapshot snapshot); -extern void heap_parallelscan_initialize(ParallelHeapScanDesc target, - Relation relation, Snapshot snapshot); -extern void heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan); -extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc); +extern void heap_endscan(TableScanDesc scan); +extern HeapTuple heap_getnext(TableScanDesc scan, ScanDirection direction); +extern bool heap_getnextslot(TableScanDesc sscan, + ScanDirection direction, struct TupleTableSlot *slot); extern bool heap_fetch(Relation relation, Snapshot snapshot, HeapTuple tuple, Buffer *userbuf, bool keep_buf, @@ -164,7 +197,6 @@ extern void simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup); extern void heap_sync(Relation relation); -extern void heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot); /* in heap/pruneheap.c */ extern void heap_page_prune_opt(Relation relation, Buffer buffer); @@ -190,7 +222,7 @@ extern void heap_vacuum_rel(Relation onerel, int options, /* in heap/heapam_visibility.c */ extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot, - Buffer buffer); + Buffer buffer); extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid, Buffer buffer); extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin, diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h index b78ef2f47d0..82de4cdcf2c 100644 --- a/src/include/access/relscan.h +++ b/src/include/access/relscan.h @@ -21,63 +21,76 @@ #include "storage/spin.h" #include "utils/relcache.h" + +struct ParallelTableScanDescData; + /* - * Shared state for parallel heap scan. - * - * Each backend participating in a parallel heap scan has its own - * HeapScanDesc in backend-private memory, and those objects all contain - * a pointer to this structure. The information here must be sufficient - * to properly initialize each new HeapScanDesc as workers join the scan, - * and it must act as a font of block numbers for those workers. + * Generic descriptor for table scans. This is the base-class for table scans, + * which needs to be embedded in the scans of individual AMs. */ -typedef struct ParallelHeapScanDescData -{ - Oid phs_relid; /* OID of relation to scan */ - bool phs_syncscan; /* report location to syncscan logic? */ - BlockNumber phs_nblocks; /* # blocks in relation at start of scan */ - slock_t phs_mutex; /* mutual exclusion for setting startblock */ - BlockNumber phs_startblock; /* starting block number */ - pg_atomic_uint64 phs_nallocated; /* number of blocks allocated to - * workers so far. */ - bool phs_snapshot_any; /* SnapshotAny, not phs_snapshot_data? */ - char phs_snapshot_data[FLEXIBLE_ARRAY_MEMBER]; -} ParallelHeapScanDescData; - -typedef struct HeapScanDescData +typedef struct TableScanDescData { /* scan parameters */ Relation rs_rd; /* heap relation descriptor */ struct SnapshotData *rs_snapshot; /* snapshot to see */ int rs_nkeys; /* number of scan keys */ - struct ScanKeyData *rs_key; /* array of scan key descriptors */ + struct ScanKeyData *rs_key; /* array of scan key descriptors */ bool rs_bitmapscan; /* true if this is really a bitmap scan */ bool rs_samplescan; /* true if this is really a sample scan */ bool rs_pageatatime; /* verify visibility page-at-a-time? */ bool rs_allow_strat; /* allow or disallow use of access strategy */ bool rs_allow_sync; /* allow or disallow use of syncscan */ bool rs_temp_snap; /* unregister snapshot at scan end? */ - - /* state set up at initscan time */ - BlockNumber rs_nblocks; /* total number of blocks in rel */ - BlockNumber rs_startblock; /* block # to start at */ - BlockNumber rs_numblocks; /* max number of blocks to scan */ - /* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */ - BufferAccessStrategy rs_strategy; /* access strategy for reads */ bool rs_syncscan; /* report location to syncscan logic? */ - /* scan current state */ - bool rs_inited; /* false = scan not init'd yet */ - HeapTupleData rs_ctup; /* current tuple in scan, if any */ - BlockNumber rs_cblock; /* current block # in scan, if any */ - Buffer rs_cbuf; /* current buffer in scan, if any */ - /* NB: if rs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ - struct ParallelHeapScanDescData *rs_parallel; /* parallel scan information */ + struct ParallelTableScanDescData *rs_parallel; /* parallel scan + * information */ - /* these fields only used in page-at-a-time mode and for bitmap scans */ - int rs_cindex; /* current tuple's index in vistuples */ - int rs_ntuples; /* number of visible tuples on page */ - OffsetNumber rs_vistuples[MaxHeapTuplesPerPage]; /* their offsets */ -} HeapScanDescData; +} TableScanDescData; +typedef struct TableScanDescData *TableScanDesc; + +/* + * Shared state for parallel table scan. + * + * Each backend participating in a parallel table scan has its own + * TableScanDesc in backend-private memory, and those objects all contain a + * pointer to this structure. The information here must be sufficient to + * properly initialize each new TableScanDesc as workers join the scan, and it + * must act as a information what to scan for those workers. + */ +typedef struct ParallelTableScanDescData +{ + Oid phs_relid; /* OID of relation to scan */ + bool phs_syncscan; /* report location to syncscan logic? */ + bool phs_snapshot_any; /* SnapshotAny, not phs_snapshot_data? */ + Size phs_snapshot_off; /* data for snapshot */ +} ParallelTableScanDescData; +typedef struct ParallelTableScanDescData *ParallelTableScanDesc; + +/* + * Shared state for parallel table scans, for block oriented storage. + */ +typedef struct ParallelBlockTableScanDescData +{ + ParallelTableScanDescData base; + + BlockNumber phs_nblocks; /* # blocks in relation at start of scan */ + slock_t phs_mutex; /* mutual exclusion for setting startblock */ + BlockNumber phs_startblock; /* starting block number */ + pg_atomic_uint64 phs_nallocated; /* number of blocks allocated to + * workers so far. */ +} ParallelBlockTableScanDescData; +typedef struct ParallelBlockTableScanDescData *ParallelBlockTableScanDesc; + +/* + * Base class for fetches from a table via an index. This is the base-class + * for such scans, which needs to be embedded in the respective struct for + * individual AMs. + */ +typedef struct IndexFetchTableData +{ + Relation rel; +} IndexFetchTableData; /* * We use the same IndexScanDescData structure for both amgettuple-based @@ -92,7 +105,7 @@ typedef struct IndexScanDescData struct SnapshotData *xs_snapshot; /* snapshot to see */ int numberOfKeys; /* number of index qualifier conditions */ int numberOfOrderBys; /* number of ordering operators */ - struct ScanKeyData *keyData; /* array of index qualifier descriptors */ + struct ScanKeyData *keyData; /* array of index qualifier descriptors */ struct ScanKeyData *orderByData; /* array of ordering op descriptors */ bool xs_want_itup; /* caller requests index tuples */ bool xs_temp_snap; /* unregister snapshot at scan end? */ @@ -115,12 +128,13 @@ typedef struct IndexScanDescData IndexTuple xs_itup; /* index tuple returned by AM */ struct TupleDescData *xs_itupdesc; /* rowtype descriptor of xs_itup */ HeapTuple xs_hitup; /* index data returned by AM, as HeapTuple */ - struct TupleDescData *xs_hitupdesc; /* rowtype descriptor of xs_hitup */ + struct TupleDescData *xs_hitupdesc; /* rowtype descriptor of xs_hitup */ + + ItemPointerData xs_heaptid; /* result */ + bool xs_heap_continue; /* T if must keep walking, potential + * further results */ + IndexFetchTableData *xs_heapfetch; - /* xs_ctup/xs_cbuf/xs_recheck are valid after a successful index_getnext */ - HeapTupleData xs_ctup; /* current heap tuple, if any */ - Buffer xs_cbuf; /* current heap buffer in scan, if any */ - /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */ bool xs_recheck; /* T means scan keys must be rechecked */ /* @@ -134,9 +148,6 @@ typedef struct IndexScanDescData bool *xs_orderbynulls; bool xs_recheckorderby; - /* state data for traversing HOT chains in index_getnext */ - bool xs_continue_hot; /* T if must keep walking HOT chain */ - /* parallel index scan information, in shared memory */ struct ParallelIndexScanDescData *parallel_scan; } IndexScanDescData; @@ -150,14 +161,17 @@ typedef struct ParallelIndexScanDescData char ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER]; } ParallelIndexScanDescData; -/* Struct for heap-or-index scans of system tables */ +struct TupleTableSlot; + +/* Struct for storage-or-index scans of system tables */ typedef struct SysScanDescData { Relation heap_rel; /* catalog being scanned */ Relation irel; /* NULL if doing heap scan */ - struct HeapScanDescData *scan; /* only valid in heap-scan case */ - struct IndexScanDescData *iscan; /* only valid in index-scan case */ - struct SnapshotData *snapshot; /* snapshot to unregister at end of scan */ + struct TableScanDescData *scan; /* only valid in storage-scan case */ + struct IndexScanDescData *iscan; /* only valid in index-scan case */ + struct SnapshotData *snapshot; /* snapshot to unregister at end of scan */ + struct TupleTableSlot *slot; } SysScanDescData; #endif /* RELSCAN_H */ diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h index ccdc6de3ae5..f2913b8cff9 100644 --- a/src/include/access/tableam.h +++ b/src/include/access/tableam.h @@ -14,31 +14,497 @@ #ifndef TABLEAM_H #define TABLEAM_H +#include "access/relscan.h" +#include "access/sdir.h" #include "utils/guc.h" +#include "utils/rel.h" +#include "utils/snapshot.h" #define DEFAULT_TABLE_ACCESS_METHOD "heap" extern char *default_table_access_method; - +extern bool synchronize_seqscans; /* * API struct for a table AM. Note this must be allocated in a * server-lifetime manner, typically as a static const struct, which then gets * returned by FormData_pg_am.amhandler. + * + * I most cases it's not appropriate to directly call the callbacks directly, + * instead use the table_* wrapper functions. + * + * GetTableAmRoutine() asserts that required callbacks are filled in, remember + * to update when adding a callback. */ typedef struct TableAmRoutine { /* this must be set to T_TableAmRoutine */ NodeTag type; + + + /* ------------------------------------------------------------------------ + * Slot related callbacks. + * ------------------------------------------------------------------------ + */ + + /* + * Return slot implementation suitable for storing a tuple of this AM. + */ + const TupleTableSlotOps *(*slot_callbacks) (Relation rel); + + + /* ------------------------------------------------------------------------ + * Table scan callbacks. + * ------------------------------------------------------------------------ + */ + + /* + * Start a scan of `rel`. The callback has to return a TableScanDesc, + * which will typically be embedded in a larger, AM specific, struct. + * + * If nkeys != 0, the results need to be filtered by those scan keys. + * + * pscan, if not NULL, will have already been initialized with + * parallelscan_initialize(), and has to be for the same relation. Will + * only be set coming from table_beginscan_parallel(). + * + * allow_{strat, sync, pagemode} specify whether a scan strategy, + * synchronized scans, or page mode may be used (although not every AM + * will support those). + * + * is_{bitmapscan, samplescan} specify whether the scan is inteded to + * support those types of scans. + * + * if temp_snap is true, the snapshot will need to be deallocated at + * scan_end. + */ + TableScanDesc (*scan_begin) (Relation rel, + Snapshot snapshot, + int nkeys, struct ScanKeyData *key, + ParallelTableScanDesc pscan, + bool allow_strat, + bool allow_sync, + bool allow_pagemode, + bool is_bitmapscan, + bool is_samplescan, + bool temp_snap); + + /* + * Release resources and deallocate scan. If TableScanDesc.temp_snap, + * TableScanDesc.rs_snapshot needs to be unregistered. + */ + void (*scan_end) (TableScanDesc scan); + + /* + * Restart relation scan. If set_params is set to true, allow{strat, + * sync, pagemode} (see scan_begin) changes should be taken into account. + */ + void (*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params, + bool allow_strat, bool allow_sync, bool allow_pagemode); + + /* + * Return next tuple from `scan`, store in slot. + */ + bool (*scan_getnextslot) (TableScanDesc scan, + ScanDirection direction, TupleTableSlot *slot); + + + /* ------------------------------------------------------------------------ + * Parallel table scan related functions. + * ------------------------------------------------------------------------ + */ + + /* + * Estimate the size of shared memory needed for a parallel scan of this + * relation. The snapshot does not need to be accounted for. + */ + Size (*parallelscan_estimate) (Relation rel); + + /* + * Initialize ParallelTableScanDesc for a parallel scan of this relation. + * pscan will be sized according to parallelscan_estimate() for the same + * relation. + */ + Size (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan); + + /* + * Reinitilize `pscan` for a new scan. `rel` will be the same relation as + * when `pscan` was initialized by parallelscan_initialize. + */ + void (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan); + + + /* ------------------------------------------------------------------------ + * Index Scan Callbacks + * ------------------------------------------------------------------------ + */ + + /* + * Prepare to fetch tuples from the relation, as needed when fetching + * tuples for an index scan. The callback has to return a + * IndexFetchTableData, which the AM will typically embed in a larger + * structure with additional information. + * + * Tuples for an index scan can then be fetched via index_fetch_tuple. + */ + struct IndexFetchTableData *(*index_fetch_begin) (Relation rel); + + /* + * Reset index fetch. Typically this will release cross index fetch + * resources held in IndexFetchTableData. + */ + void (*index_fetch_reset) (struct IndexFetchTableData *data); + + /* + * Release resources and deallocate index fetch. + */ + void (*index_fetch_end) (struct IndexFetchTableData *data); + + /* + * Fetch tuple at `tid` into `slot`, after doing a visibility test + * according to `snapshot`. If a tuple was found and passed the visibility + * test, return true, false otherwise. + * + * Note that AMs that do not necessarily update indexes when indexed + * columns do not change, need to return the current/correct version of a + * tuple as appropriate, even if the tid points to an older version of the + * tuple. + * + * *call_again is false on the first call to index_fetch_tuple for a tid. + * If there potentially is another tuple matching the tid, *call_again + * needs be set to true by index_fetch_tuple, signalling to the caller + * that index_fetch_tuple should be called again for the same tid. + * + * *all_dead should be set to true by index_fetch_tuple iff it is + * guaranteed that no backend needs to see that tuple. Index AMs can use + * that do avoid returning that tid in future searches. + */ + bool (*index_fetch_tuple) (struct IndexFetchTableData *scan, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + bool *call_again, bool *all_dead); + + /* ------------------------------------------------------------------------ + * Callbacks for non-modifying operations on individual tuples + * ------------------------------------------------------------------------ + */ + + /* + * Does the tuple in `slot` satisfy `snapshot`? The slot needs to be of + * the appropriate type for the AM. + */ + bool (*tuple_satisfies_snapshot) (Relation rel, + TupleTableSlot *slot, + Snapshot snapshot); + } TableAmRoutine; +/* ---------------------------------------------------------------------------- + * Slot functions. + * ---------------------------------------------------------------------------- + */ + +/* + * Returns slot callbacks suitable for holding tuples of the appropriate type + * for the relation. Works for tables, views, foreign tables and partitioned + * tables. + */ +extern const TupleTableSlotOps *table_slot_callbacks(Relation rel); + +/* + * Returns slot using the callbacks returned by table_slot_callbacks(), and + * registers it on *reglist. + */ +extern TupleTableSlot *table_slot_create(Relation rel, List **reglist); + + +/* ---------------------------------------------------------------------------- + * Table scan functions. + * ---------------------------------------------------------------------------- + */ + +/* + * Start a scan of `rel`. Returned tuples pass a visibility test of + * `snapshot`, and if nkeys != 0, the results are filtered by those scan keys. + */ +static inline TableScanDesc +table_beginscan(Relation rel, Snapshot snapshot, + int nkeys, struct ScanKeyData *key) +{ + return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, + true, true, true, false, false, false); +} + +/* + * Like table_beginscan(), but for scanning catalog. It'll automatically use a + * snapshot appropriate for scanning catalog relations. + */ +extern TableScanDesc table_beginscan_catalog(Relation rel, int nkeys, + struct ScanKeyData *key); + +/* + * Like table_beginscan(), but table_beginscan_strat() offers an extended API + * that lets the caller control whether a nondefault buffer access strategy + * can be used, and whether syncscan can be chosen (possibly resulting in the + * scan not starting from block zero). Both of these default to true with + * plain table_beginscan. + */ +static inline TableScanDesc +table_beginscan_strat(Relation rel, Snapshot snapshot, + int nkeys, struct ScanKeyData *key, + bool allow_strat, bool allow_sync) +{ + return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, + allow_strat, allow_sync, true, + false, false, false); +} + + +/* + * table_beginscan_bm is an alternative entry point for setting up a + * TableScanDesc for a bitmap heap scan. Although that scan technology is + * really quite unlike a standard seqscan, there is just enough commonality to + * make it worth using the same data structure. + */ +static inline TableScanDesc +table_beginscan_bm(Relation rel, Snapshot snapshot, + int nkeys, struct ScanKeyData *key) +{ + return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, + false, false, true, true, false, false); +} + +/* + * table_beginscan_sampling is an alternative entry point for setting up a + * TableScanDesc for a TABLESAMPLE scan. As with bitmap scans, it's worth + * using the same data structure although the behavior is rather different. + * In addition to the options offered by table_beginscan_strat, this call + * also allows control of whether page-mode visibility checking is used. + */ +static inline TableScanDesc +table_beginscan_sampling(Relation rel, Snapshot snapshot, + int nkeys, struct ScanKeyData *key, + bool allow_strat, bool allow_sync, bool allow_pagemode) +{ + return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, + allow_strat, allow_sync, allow_pagemode, + false, true, false); +} + +/* + * table_beginscan_analyze is an alternative entry point for setting up a + * TableScanDesc for an ANALYZE scan. As with bitmap scans, it's worth using + * the same data structure although the behavior is rather different. + */ +static inline TableScanDesc +table_beginscan_analyze(Relation rel) +{ + return rel->rd_tableam->scan_begin(rel, NULL, 0, NULL, NULL, + true, false, true, + false, true, false); +} + +/* + * End relation scan. + */ +static inline void +table_endscan(TableScanDesc scan) +{ + scan->rs_rd->rd_tableam->scan_end(scan); +} + + +/* + * Restart a relation scan. + */ +static inline void +table_rescan(TableScanDesc scan, + struct ScanKeyData *key) +{ + scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false); +} + +/* + * Restart a relation scan after changing params. + * + * This call allows changing the buffer strategy, syncscan, and pagemode + * options before starting a fresh scan. Note that although the actual use of + * syncscan might change (effectively, enabling or disabling reporting), the + * previously selected startblock will be kept. + */ +static inline void +table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key, + bool allow_strat, bool allow_sync, bool allow_pagemode) +{ + scan->rs_rd->rd_tableam->scan_rescan(scan, key, true, + allow_strat, allow_sync, + allow_pagemode); +} + +/* + * Update snapshot used by the scan. + */ +extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot); + + +/* + * Return next tuple from `scan`, store in slot. + */ +static inline bool +table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot) +{ + slot->tts_tableOid = RelationGetRelid(sscan->rs_rd); + return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot); +} + + +/* ---------------------------------------------------------------------------- + * Parallel table scan related functions. + * ---------------------------------------------------------------------------- + */ + +/* + * Estimate the size of shared memory needed for a parallel scan of this + * relation. + */ +extern Size table_parallelscan_estimate(Relation rel, Snapshot snapshot); + +/* + * Initialize ParallelTableScanDesc for a parallel scan of this + * relation. `pscan` needs to be sized according to parallelscan_estimate() + * for the same relation. Call this just once in the leader process; then, + * individual workers attach via table_beginscan_parallel. + */ +extern void table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan, Snapshot snapshot); + +/* + * Begin a parallel scan. `pscan` needs to have been initialized with + * table_parallelscan_initialize(), for the same relation. The initialization + * does not need to have happened in this backend. + * + * Caller must hold a suitable lock on the correct relation. + */ +extern TableScanDesc table_beginscan_parallel(Relation rel, ParallelTableScanDesc pscan); + +/* + * Restart a parallel scan. Call this in the leader process. Caller is + * responsible for making sure that all workers have finished the scan + * beforehand. + */ +static inline void +table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan) +{ + return rel->rd_tableam->parallelscan_reinitialize(rel, pscan); +} + + +/* ---------------------------------------------------------------------------- + * Index scan related functions. + * ---------------------------------------------------------------------------- + */ + +/* + * Prepare to fetch tuples from the relation, as needed when fetching tuples + * for an index scan. + * + * Tuples for an index scan can then be fetched via table_index_fetch_tuple(). + */ +static inline IndexFetchTableData * +table_index_fetch_begin(Relation rel) +{ + return rel->rd_tableam->index_fetch_begin(rel); +} + +/* + * Reset index fetch. Typically this will release cross index fetch resources + * held in IndexFetchTableData. + */ +static inline void +table_index_fetch_reset(struct IndexFetchTableData *scan) +{ + scan->rel->rd_tableam->index_fetch_reset(scan); +} + +/* + * Release resources and deallocate index fetch. + */ +static inline void +table_index_fetch_end(struct IndexFetchTableData *scan) +{ + scan->rel->rd_tableam->index_fetch_end(scan); +} + +/* + * Fetches tuple at `tid` into `slot`, after doing a visibility test according + * to `snapshot`. If a tuple was found and passed the visibility test, returns + * true, false otherwise. + * + * *call_again needs to be false on the first call to table_index_fetch_tuple() for + * a tid. If there potentially is another tuple matching the tid, *call_again + * will be set to true, signalling that table_index_fetch_tuple() should be called + * again for the same tid. + * + * *all_dead will be set to true by table_index_fetch_tuple() iff it is guaranteed + * that no backend needs to see that tuple. Index AMs can use that do avoid + * returning that tid in future searches. + */ +static inline bool +table_index_fetch_tuple(struct IndexFetchTableData *scan, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + bool *call_again, bool *all_dead) +{ + + return scan->rel->rd_tableam->index_fetch_tuple(scan, tid, snapshot, + slot, call_again, + all_dead); +} + + +/* ------------------------------------------------------------------------ + * Functions for non-modifying operations on individual tuples + * ------------------------------------------------------------------------ + */ /* + * Return true iff tuple in slot satisfies the snapshot. + * + * This assumes the slot's tuple is valid, and of the appropriate type for the + * AM. + * + * Some AMs might modify the data underlying the tuple as a side-effect. If so + * they ought to mark the relevant buffer dirty. + */ +static inline bool +table_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, Snapshot snapshot) +{ + return rel->rd_tableam->tuple_satisfies_snapshot(rel, slot, snapshot); +} + + +/* ---------------------------------------------------------------------------- + * Helper functions to implement parallel scans for block oriented AMs. + * ---------------------------------------------------------------------------- + */ + +extern Size table_block_parallelscan_estimate(Relation rel); +extern Size table_block_parallelscan_initialize(Relation rel, + ParallelTableScanDesc pscan); +extern void table_block_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan); +extern BlockNumber table_block_parallelscan_nextpage(Relation rel, ParallelBlockTableScanDesc pbscan); +extern void table_block_parallelscan_startblock_init(Relation rel, ParallelBlockTableScanDesc pbscan); + + +/* ---------------------------------------------------------------------------- * Functions in tableamapi.c + * ---------------------------------------------------------------------------- */ + extern const TableAmRoutine *GetTableAmRoutine(Oid amhandler); extern const TableAmRoutine *GetTableAmRoutineByAmId(Oid amoid); extern const TableAmRoutine *GetHeapamTableAmRoutine(void); |