You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
1
(14) |
2
|
3
(4) |
4
(12) |
5
(14) |
6
|
7
(1) |
8
(7) |
9
(10) |
10
(7) |
11
(8) |
12
(6) |
13
|
14
(1) |
15
(3) |
16
(1) |
17
(8) |
18
(11) |
19
(3) |
20
|
21
(2) |
22
(9) |
23
(2) |
24
(14) |
25
(13) |
26
(1) |
27
|
28
|
29
(1) |
30
(11) |
|
|
|
|
From: Amit K. <ami...@en...> - 2013-04-26 13:09:16
|
In the UPDATE implemention in XC, once we get the source data alongwith ctids, we start updating the records, but we don't issue a row lock, so we can end up updating a totally different row if it is already updated by some other update that happened in between fetching the rows and updating them. And this results in data inconsistency. For an e.g. see here: http://sourceforge.net/tracker/?func=detail&aid=3606317&group_id=311227&atid=1310232 PG acquires an exclusive row lock and gets a revised version of the tuple if the tuple has changed (heap_update), and then again runs the quals to check if this new tuple satisfies the quals. XC does not do this. So another concurrent update happening after the ctid has fetched but before updating the tuple with that ctid can cause data inconsistency. The same happens for DELETE, and also for a BEFORE TRIGGER execution. The row should be locked before doing both of these operations, and the revised version of the tuple should be used. So in case of all the 3 operations above, lock the rows that are to be processed by appending FOR UPDATE in the remote query. If there are coordinator quals, we should run FOR-UPDATE for only those rows that satisfy the quals. Once this is done, we can safely assume that the row is locked when it comes to actual processing of that row (update/delete/trigger), and so skip the "lock-and-fetch-revised-tuple-and-rerun-qual" step that PG does. Consider this query: UPDATE employees SET sales_count = sales_count + 1 FROM accounts WHERE accounts.name = 'Acme' AND employees.empname = f(employees.empname) and employees.id = accounts.sales_person; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Update on public.employees Node expr: employees.id Remote query: UPDATE ONLY employees SET sales_count = $3 WHERE ((employees.ctid = $4) AND (employees.xc_node_id = $5)) -> Nested Loop Output: employees.id, NULL::character varying, (employees.sales_count + 1), employees.ctid, employees.xc_node_id, accounts.ctid Join Filter: (employees.id = accounts.sales_person) -> Data Node Scan on *employees* "_REMOTE_TABLE_QUERY_" Output: employees.id, employees.sales_count, employees.ctid, employees.xc_node_id Remote query: SELECT id, sales_count, ctid, xc_node_id, empname FROM ONLY employees WHERE true Coordinator quals: ((employees.empname)::text = (f(employees.empname))::text) -> Data Node Scan on *accounts* "_REMOTE_TABLE_QUERY_" Output: accounts.ctid, accounts.sales_person Remote query: SELECT ctid, sales_person FROM ONLY accounts WHERE ((name)::text = 'Acme'::text) (13 rows) The idea is to make the plan look something like this: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Update on public.employees Node expr: employees.id Remote query: UPDATE ONLY employees SET sales_count = $3 WHERE ((employees.ctid = $4) AND (employees.xc_node_id = $5)) -> Nested Loop Output: employees.id, NULL::character varying, (employees.sales_count + 1), employees.ctid, employees.xc_node_id, accounts.ctid Join Filter: (employees.id = accounts.sales_person) -> Data Node Scan on *employees* "_REMOTE_TABLE_QUERY_" (*Or some different node type)* Output: employees.id, employees.sales_count, employees.ctid, employees.xc_node_id, employees.* Remote query: SELECT id, sales_count, ctid, xc_node_id, employees.* FROM ONLY employees *WHERE employees.ctid IN ($1) FOR UPDATE* *OF employees* -> Data Node Scan on employees "_REMOTE_TABLE_QUERY_" Output: employees.ctid, employees.xc_node_id Remote query: SELECT *ctid, xc_node_id*, empname FROM ONLY employees WHERE true Coordinator quals: ((employees.empname)::text = (f(employees.empname))::text) -> Data Node Scan on *accounts* "_REMOTE_TABLE_QUERY_" Output: accounts.ctid, accounts.sales_person Remote query: SELECT ctid, sales_person FROM ONLY accounts WHERE ((name)::text = 'Acme'::text) (13 rows) In the above hypothetical plan, the upper RemoteQuery node has a subnode that fetches only ctid/xc_node_id. The results are collected and input to the $1 parameter for the parent node for the IN clause. The upper RemoteQuery node (or some new type of node) on the first execution will have to do special processing : it will run ExecProcNode() multiple times until it fetches all the ctids from the child node. And then for subsequent execution, keep returning the fetched tuples. The situation complicates further because all the ctids should belong to the same node_id. So it needs to do this cycle for each of the nodes, which means it should group the collected ctids according to which node_id they belong. All this would happen transparently to the caller of this RemoteQuery node. In the above plan, the Nested Loop node would get the rows from employees as usual, but effectively they would be locked. Instead of *ctid IN ($1) *if we use *ctid = ($1), *it means the upper node will not do batch processing, it will retrieve the lower node ctids one by one and return the SELECT FOR UPDATE row for this ctid to the caller; that is, the upper RemoteQuery node would not have to do any special processing. But obviously it is slower. Will check how involved it gets with *ctid in ($1)*. create_remotequery_plan() function itself would create a subnode and save it in its lefttree. This parent node will be marked as read-only=false or something else which will indicate to the combiner that primary node should be used to serialize the FOR-UPDATE locking. Comments and suggestions welcome. Thanks -Amit |