postgres-xc-developers Mailing List for Postgres-XC

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

In the UPDATE implemention in XC, once we get the source data alongwith
ctids, we start updating the records, but we don't issue a row lock, so we
can end up updating a totally different row if it is already updated by
some other update that happened in between fetching the rows and updating
them. And this results in data inconsistency. For an e.g. see here:
http://sourceforge.net/tracker/?func=detail&aid=3606317&group_id=311227&atid=1310232

PG acquires an exclusive row lock and gets a revised version of the tuple
if the tuple has changed (heap_update), and then again runs the quals to
check if this new tuple satisfies the quals. XC does not do this. So
another concurrent update happening after the ctid has fetched but before
updating the tuple with that ctid can cause data inconsistency.

The same happens for DELETE, and also for a BEFORE TRIGGER execution. The
row should be locked before doing both of these operations, and the revised
version of the tuple should be used.

So in case of all the 3 operations above, lock the rows that are to be
processed by appending FOR UPDATE in the remote query. If there are
coordinator quals, we should run FOR-UPDATE for only those rows that
satisfy the quals. Once this is done, we can safely assume that the row is
locked when it comes to actual processing of that row
(update/delete/trigger), and so skip the
"lock-and-fetch-revised-tuple-and-rerun-qual" step that PG does.

Consider this query:

UPDATE employees SET sales_count = sales_count + 1 FROM accounts
  WHERE accounts.name = 'Acme'
  AND employees.empname = f(employees.empname) and employees.id =
accounts.sales_person;

                                                               QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Update on public.employees
   Node expr: employees.id
   Remote query: UPDATE ONLY employees SET sales_count = $3 WHERE
((employees.ctid = $4) AND (employees.xc_node_id = $5))
   ->  Nested Loop
         Output: employees.id, NULL::character varying,
(employees.sales_count + 1), employees.ctid, employees.xc_node_id,
accounts.ctid
         Join Filter: (employees.id = accounts.sales_person)
         ->  Data Node Scan on *employees* "_REMOTE_TABLE_QUERY_"
               Output: employees.id, employees.sales_count, employees.ctid,
employees.xc_node_id
               Remote query: SELECT id, sales_count, ctid, xc_node_id,
empname FROM ONLY employees WHERE true
               Coordinator quals: ((employees.empname)::text =
(f(employees.empname))::text)
         ->  Data Node Scan on *accounts* "_REMOTE_TABLE_QUERY_"
               Output: accounts.ctid, accounts.sales_person
               Remote query: SELECT ctid, sales_person FROM ONLY accounts
WHERE ((name)::text = 'Acme'::text)
(13 rows)

The idea is to make the plan look something like this:

                                                               QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Update on public.employees
   Node expr: employees.id
   Remote query: UPDATE ONLY employees SET sales_count = $3 WHERE
((employees.ctid = $4) AND (employees.xc_node_id = $5))

   ->  Nested Loop
         Output: employees.id, NULL::character varying,
(employees.sales_count + 1), employees.ctid, employees.xc_node_id,
accounts.ctid
         Join Filter: (employees.id = accounts.sales_person)

         ->  Data Node Scan on *employees* "_REMOTE_TABLE_QUERY_"  (*Or
some different node type)*
               Output: employees.id, employees.sales_count, employees.ctid,
employees.xc_node_id, employees.*
               Remote query: SELECT id, sales_count, ctid, xc_node_id,
employees.* FROM ONLY employees *WHERE employees.ctid IN ($1) FOR UPDATE* *OF
employees*
                 ->  Data Node Scan on employees "_REMOTE_TABLE_QUERY_"
                       Output: employees.ctid, employees.xc_node_id
                       Remote query: SELECT *ctid, xc_node_id*, empname
FROM ONLY employees WHERE true
                       Coordinator quals: ((employees.empname)::text =
(f(employees.empname))::text)

         ->  Data Node Scan on *accounts* "_REMOTE_TABLE_QUERY_"
               Output: accounts.ctid, accounts.sales_person
               Remote query: SELECT ctid, sales_person FROM ONLY accounts
WHERE ((name)::text = 'Acme'::text)
(13 rows)

In the above hypothetical plan, the upper RemoteQuery node has a subnode
that fetches only ctid/xc_node_id. The results are collected and input to
the $1 parameter for the parent node for the IN clause. The upper
RemoteQuery node (or some new type of node) on the first execution will
have to do special processing : it will run  ExecProcNode() multiple times
until it fetches all the ctids from the child node. And then for subsequent
execution, keep returning the fetched tuples. The situation complicates
further because all the ctids should belong to the same node_id. So  it
needs to do this cycle for each of the nodes, which means it should group
the collected ctids according to which node_id they belong. All this would
happen transparently to the caller of this RemoteQuery node. In the above
plan, the Nested Loop node would get the rows from employees as usual, but
effectively they would be locked.

Instead of *ctid IN ($1) *if we use *ctid = ($1), *it means the upper node
will not do batch processing, it will retrieve the lower node ctids one by
one and return the SELECT FOR UPDATE row for this ctid to the caller; that
is, the upper RemoteQuery node would not have to do any special processing.
But obviously it is slower. Will check how involved it gets with *ctid in
($1)*.

create_remotequery_plan() function itself would create a subnode and save
it in its lefttree. This parent node will be marked as read-only=false or
something else which will indicate to the combiner that primary node should
be used to serialize the FOR-UPDATE locking.

Comments and suggestions welcome.

Thanks
-Amit

2010	Jan	Feb	Mar	Apr (10)	May (17)	Jun (3)	Jul	Aug	Sep (8)	Oct (18)	Nov (51)	Dec (74)
2011	Jan (47)	Feb (44)	Mar (44)	Apr (102)	May (35)	Jun (25)	Jul (56)	Aug (69)	Sep (32)	Oct (37)	Nov (31)	Dec (16)
2012	Jan (34)	Feb (127)	Mar (218)	Apr (252)	May (80)	Jun (137)	Jul (205)	Aug (159)	Sep (35)	Oct (50)	Nov (82)	Dec (52)
2013	Jan (107)	Feb (159)	Mar (118)	Apr (163)	May (151)	Jun (89)	Jul (106)	Aug (177)	Sep (49)	Oct (63)	Nov (46)	Dec (7)
2014	Jan (65)	Feb (128)	Mar (40)	Apr (11)	May (4)	Jun (8)	Jul (16)	Aug (11)	Sep (4)	Oct (1)	Nov (5)	Dec (16)
2015	Jan (5)	Feb	Mar (2)	Apr (5)	May (4)	Jun (12)	Jul	Aug	Sep	Oct	Nov	Dec (4)
2019	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
	1 (14)	2	3 (4)	4 (12)	5 (14)	6
7 (1)	8 (7)	9 (10)	10 (7)	11 (8)	12 (6)	13
14 (1)	15 (3)	16 (1)	17 (8)	18 (11)	19 (3)	20
21 (2)	22 (9)	23 (2)	24 (14)	25 (13)	26 (1)	27
28	29 (1)	30 (11)

postgres-xc-developers Mailing List for Postgres-XC

postgres-xc-developers — Postgres-XC hackers and developers