From: Abbas B. <abb...@en...> - 2012-06-15 03:07:41
|
Hi, In a multi-statement transaction each statement is given a command identifier starting from zero and incrementing for each statement. These command indentifers are required for extra tracking because each statement has its own visibility rules with in the transaction. For example, a cursor’s contents must remain unchanged even if later statements in the same transaction modify rows. Such tracking is implemented using system command id columns cmin/cmax, which is internally actually is a single column. cmin/cmax come into play in case of multi-statement transactions only, they are both zero otherwise. cmin "The command identifier of the statement within the inserting transaction." cmax "The command identifier of the statement within the deleting transaction." Here are the visibility rules (taken from comments of tqual.c) ( // A heap tuple is valid "now" iff Xmin == my-transaction && // inserted by the current transaction Cmin < my-command && // before this command, and ( Xmax is null || // the row has not been deleted, or ( Xmax == my-transaction && // it was deleted by the current transaction Cmax >= my-command // but not before this command, ) ) ) || // or ( Xmin is committed && // the row was inserted by a committed transaction, and ( Xmax is null || // the row has not been deleted, or ( Xmax == my-transaction && // the row is being deleted by this transaction Cmax >= my-command) || // but it's not deleted "yet", or ( Xmax != my-transaction && // the row was deleted by another transaction Xmax is not committed // that has not been committed ) ) ) ) Because cmin and cmax are internally a single system column, it is therefore not possible to simply record the status of a row that is created and expired in the same multi-statement transaction. For that reason, a special combo command id is created that references a local memory hash that contains the actual cmin and cmax values. It means that if combo id is being used the number we are seeing would not be the cmin or cmax it will be an index into a local array that contains a structure with has the actual cmin and cmax values. The following queries (taken mostly from http://momjian.us/main/writings/pgsql/mvcc.pdf) use the contrib module pageinspect, which allows visibility of internal heap page structures and all stored rows, including those not visible in the current snapshot. (Bit 0x0020 is defined as HEAP_COMBOCID.) We are exploring 3 examples here: 1) INSERT & DELETE in a single transaction 2) INSERT & UPDATE in a single transaction 3) INSERT from two different transactions & UPDATE from one test=# drop table mvcc_demo; DROP TABLE test=# test=# create table mvcc_demo (val int); CREATE TABLE test=# test=# TRUNCATE mvcc_demo; TRUNCATE TABLE test=# test=# BEGIN; BEGIN test=# DELETE FROM mvcc_demo; -- increment command id to show that combo id would be different DELETE 0 test=# DELETE FROM mvcc_demo; DELETE 0 test=# DELETE FROM mvcc_demo; DELETE 0 test=# INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80685 | 0 | 3 | f 80685 | 0 | 4 | f 80685 | 0 | 5 | f (3 rows) test=# test=# DELETE FROM mvcc_demo; DELETE 3 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80685 | 80685 | 0 | t 80685 | 80685 | 1 | t 80685 | 80685 | 2 | t (3 rows) Note that since is_combocid is true the numbers are not cmin/cmax they are actually the indexes of the internal array already explained above. combo id index 0 would contain cmin 3, cmax 6 combo id index 1 would contain cmin 4, cmax 6 combo id index 2 would contain cmin 5, cmax 6 test=# test=# END; COMMIT test=# test=# test=# TRUNCATE mvcc_demo; TRUNCATE TABLE test=# test=# test=# test=# BEGIN; BEGIN test=# INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80675 | 0 | 0 | f 80675 | 0 | 1 | f 80675 | 0 | 2 | f (3 rows) test=# test=# UPDATE mvcc_demo SET val = val * 10; UPDATE 3 test=# test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80675 | 80675 | 0 | t 80675 | 80675 | 1 | t 80675 | 80675 | 2 | t 80675 | 0 | 3 | f 80675 | 0 | 3 | f 80675 | 0 | 3 | f (6 rows) test=# test=# END; COMMIT test=# test=# test=# TRUNCATE mvcc_demo; TRUNCATE TABLE test=# -- From one psql issue test=# INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80677 | 0 | 0 | f (1 row) test=# -- From another issue test=# BEGIN; BEGIN test=# INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (4); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80677 | 0 | 0 | f 80678 | 0 | 0 | f 80678 | 0 | 1 | f 80678 | 0 | 2 | f (4 rows) test=# test=# UPDATE mvcc_demo SET val = val * 10; UPDATE 4 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80678 | 80678 | 0 | t 80678 | 80678 | 1 | t 80678 | 80678 | 2 | t 80677 | 80678 | 3 | f 80678 | 0 | 3 | f 80678 | 0 | 3 | f 80678 | 0 | 3 | f 80678 | 0 | 3 | f (8 rows) test=# test=# -- Before finishing this, issue these from the first psql test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80678 | 80678 | 0 | t 80678 | 80678 | 1 | t 80678 | 80678 | 2 | t 80677 | 80678 | 3 | f 80678 | 0 | 3 | f 80678 | 0 | 3 | f 80678 | 0 | 3 | f 80678 | 0 | 3 | f (8 rows) test=# END; COMMIT Now consider the case we are trying to solve drop table tt1; create table tt1(f1 int); BEGIN; insert into tt1 values(1); declare c50 cursor for select * from tt1; -- should show one row only insert into tt1 values(2); fetch all from c50; COMMIT; Consider Data node 1 log (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read committed READ WRITE] (b) [exec_simple_query][1026][drop table tt1;] (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read committed READ WRITE] (f) [exec_simple_query][1026][create table tt1(f1 int);] (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read committed READ WRITE] (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM tt1] (m) [exec_simple_query][1026][COMMIT TRANSACTION] The cursor currently shows both inserted rows because command id at data node in step (j) is 0 step (k) is 1 & step (l) is 2 Where as we need command ids to be step (j) should be 0 step (k) should be 2 & step (l) should be 1 This will solve the cursor visibility problem. To implement this I suggest we send command IDs to data nodes from the coordinator like we send gxid. The only difference will be that we do not need to take command IDs from GTM since they are only valid with in the transaction. See this example test=# select xmin,xmax,cmin,cmax,* from tt1; xmin | xmax | cmin | cmax | f1 ------+------+------+------+---- (0 rows) test=# begin; BEGIN test=# insert into tt1 values(1); INSERT 0 1 test=# select xmin,xmax,cmin,cmax,* from tt1; xmin | xmax | cmin | cmax | f1 -------+------+------+------+---- 80615 | 0 | 0 | 0 | 1 (1 row) test=# insert into tt1 values(2); INSERT 0 1 test=# select xmin,xmax,cmin,cmax,* from tt1; xmin | xmax | cmin | cmax | f1 -------+------+------+------+---- 80615 | 0 | 0 | 0 | 1 80615 | 0 | 1 | 1 | 2 (2 rows) test=# insert into tt1 values(3); INSERT 0 1 test=# select xmin,xmax,cmin,cmax,* from tt1; xmin | xmax | cmin | cmax | f1 -------+------+------+------+---- 80615 | 0 | 0 | 0 | 1 80615 | 0 | 1 | 1 | 2 80615 | 0 | 2 | 2 | 3 (3 rows) test=# insert into tt1 values(4); INSERT 0 1 test=# select xmin,xmax,cmin,cmax,* from tt1; xmin | xmax | cmin | cmax | f1 -------+------+------+------+---- 80615 | 0 | 0 | 0 | 1 80615 | 0 | 1 | 1 | 2 80615 | 0 | 2 | 2 | 3 80615 | 0 | 3 | 3 | 4 (4 rows) test=# end; COMMIT test=# test=# test=# select xmin,xmax,cmin,cmax,* from tt1; xmin | xmax | cmin | cmax | f1 -------+------+------+------+---- 80615 | 0 | 0 | 0 | 1 80615 | 0 | 1 | 1 | 2 80615 | 0 | 2 | 2 | 3 80615 | 0 | 3 | 3 | 4 (4 rows) test=# insert into tt1 values(5); INSERT 0 1 test=# select xmin,xmax,cmin,cmax,* from tt1; xmin | xmax | cmin | cmax | f1 -------+------+------+------+---- 80615 | 0 | 0 | 0 | 1 80615 | 0 | 1 | 1 | 2 80615 | 0 | 2 | 2 | 3 80615 | 0 | 3 | 3 | 4 80616 | 0 | 0 | 0 | 5 (5 rows) test=# insert into tt1 values(6); INSERT 0 1 test=# test=# test=# select xmin,xmax,cmin,cmax,* from tt1; xmin | xmax | cmin | cmax | f1 -------+------+------+------+---- 80615 | 0 | 0 | 0 | 1 80615 | 0 | 1 | 1 | 2 80615 | 0 | 2 | 2 | 3 80615 | 0 | 3 | 3 | 4 80616 | 0 | 0 | 0 | 5 80617 | 0 | 0 | 0 | 6 (6 rows) Note that at the end of the multi-statement transaction the command id gets reset to zero. -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company |
From: Abbas B. <abb...@en...> - 2012-06-18 20:32:00
Attachments:
1_cvp.patch
|
PFA a WIP patch implementing the design presented earlier. The patch is WIP because it still has and FIXME and it shows some regression failures that need to be fixed, but other than that it confirms that the suggested design would work fine. The following test cases now work fine drop table tt1; create table tt1(f1 int) distribute by replication; BEGIN; insert into tt1 values(1); declare c50 cursor for select * from tt1; insert into tt1 values(2); fetch all from c50; COMMIT; truncate table tt1; BEGIN; declare c50 cursor for select * from tt1; insert into tt1 values(1); insert into tt1 values(2); fetch all from c50; COMMIT; truncate table tt1; BEGIN; insert into tt1 values(1); insert into tt1 values(2); declare c50 cursor for select * from tt1; insert into tt1 values(3); fetch all from c50; COMMIT; truncate table tt1; BEGIN; insert into tt1 values(1); declare c50 cursor for select * from tt1; insert into tt1 values(2); declare c51 cursor for select * from tt1; insert into tt1 values(3); fetch all from c50; fetch all from c51; COMMIT; truncate table tt1; BEGIN; insert into tt1 values(1); declare c50 cursor for select * from tt1; declare c51 cursor for select * from tt1; insert into tt1 values(2); insert into tt1 values(3); fetch all from c50; fetch all from c51; COMMIT; truncate table tt1; On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt <abb...@en...>wrote: > Hi, > > In a multi-statement transaction each statement is given a command > identifier > starting from zero and incrementing for each statement. > These command indentifers are required for extra tracking because each > statement has its own visibility rules with in the transaction. > For example, a cursor’s contents must remain unchanged even if later > statements in the > same transaction modify rows. Such tracking is implemented using system > command id > columns cmin/cmax, which is internally actually is a single column. > > cmin/cmax come into play in case of multi-statement transactions only, > they are both zero otherwise. > > cmin "The command identifier of the statement within the inserting > transaction." > cmax "The command identifier of the statement within the deleting > transaction." > > Here are the visibility rules (taken from comments of tqual.c) > > ( // A heap tuple is valid "now" iff > Xmin == my-transaction && // inserted by the current > transaction > Cmin < my-command && // before this command, and > ( > Xmax is null || // the row has not been deleted, > or > ( > Xmax == my-transaction && // it was deleted by the current > transaction > Cmax >= my-command // but not before this command, > ) > ) > ) > || // or > ( > Xmin is committed && // the row was inserted by a > committed transaction, and > ( > Xmax is null || // the row has not been deleted, > or > ( > Xmax == my-transaction && // the row is being deleted by > this transaction > Cmax >= my-command) || // but it's not deleted "yet", or > ( > Xmax != my-transaction && // the row was deleted by another > transaction > Xmax is not committed // that has not been committed > ) > ) > ) > ) > > Because cmin and cmax are internally a single system column, > it is therefore not possible to simply record the status of a row > that is created and expired in the same multi-statement transaction. > For that reason, a special combo command id is created that references > a local memory hash that contains the actual cmin and cmax values. > It means that if combo id is being used the number we are seeing > would not be the cmin or cmax it will be an index into a local > array that contains a structure with has the actual cmin and cmax values. > > The following queries (taken mostly from > http://momjian.us/main/writings/pgsql/mvcc.pdf) > use the contrib module pageinspect, which allows > visibility of internal heap page structures and all stored rows, > including those not visible in the current snapshot. > (Bit 0x0020 is defined as HEAP_COMBOCID.) > > We are exploring 3 examples here: > 1) INSERT & DELETE in a single transaction > 2) INSERT & UPDATE in a single transaction > 3) INSERT from two different transactions & UPDATE from one > > test=# drop table mvcc_demo; > DROP TABLE > test=# > test=# create table mvcc_demo (val int); > CREATE TABLE > test=# > test=# TRUNCATE mvcc_demo; > TRUNCATE TABLE > test=# > test=# BEGIN; > BEGIN > test=# DELETE FROM mvcc_demo; -- increment command id to show that combo > id would be different > DELETE 0 > test=# DELETE FROM mvcc_demo; > DELETE 0 > test=# DELETE FROM mvcc_demo; > DELETE 0 > test=# INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80685 | 0 | 3 | f > 80685 | 0 | 4 | f > 80685 | 0 | 5 | f > (3 rows) > > test=# > test=# DELETE FROM mvcc_demo; > DELETE 3 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80685 | 80685 | 0 | t > 80685 | 80685 | 1 | t > 80685 | 80685 | 2 | t > (3 rows) > > Note that since is_combocid is true the numbers are not cmin/cmax they are > actually > the indexes of the internal array already explained above. > combo id index 0 would contain cmin 3, cmax 6 > combo id index 1 would contain cmin 4, cmax 6 > combo id index 2 would contain cmin 5, cmax 6 > > test=# > test=# END; > COMMIT > test=# > test=# > test=# TRUNCATE mvcc_demo; > TRUNCATE TABLE > test=# > test=# > test=# > test=# BEGIN; > BEGIN > test=# INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80675 | 0 | 0 | f > 80675 | 0 | 1 | f > 80675 | 0 | 2 | f > (3 rows) > > test=# > test=# UPDATE mvcc_demo SET val = val * 10; > UPDATE 3 > test=# > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80675 | 80675 | 0 | t > 80675 | 80675 | 1 | t > 80675 | 80675 | 2 | t > 80675 | 0 | 3 | f > 80675 | 0 | 3 | f > 80675 | 0 | 3 | f > (6 rows) > > test=# > test=# END; > COMMIT > test=# > test=# > test=# TRUNCATE mvcc_demo; > TRUNCATE TABLE > test=# > > -- From one psql issue > test=# INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS > is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80677 | 0 | 0 | f > (1 row) > > > test=# -- From another issue > test=# BEGIN; > BEGIN > test=# INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (4); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80677 | 0 | 0 | f > 80678 | 0 | 0 | f > 80678 | 0 | 1 | f > 80678 | 0 | 2 | f > (4 rows) > > test=# > test=# UPDATE mvcc_demo SET val = val * 10; > UPDATE 4 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80678 | 80678 | 0 | t > 80678 | 80678 | 1 | t > 80678 | 80678 | 2 | t > 80677 | 80678 | 3 | f > 80678 | 0 | 3 | f > 80678 | 0 | 3 | f > 80678 | 0 | 3 | f > 80678 | 0 | 3 | f > (8 rows) > > test=# > > test=# -- Before finishing this, issue these from the first psql > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS > is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80678 | 80678 | 0 | t > 80678 | 80678 | 1 | t > 80678 | 80678 | 2 | t > 80677 | 80678 | 3 | f > 80678 | 0 | 3 | f > 80678 | 0 | 3 | f > 80678 | 0 | 3 | f > 80678 | 0 | 3 | f > (8 rows) > > test=# END; > COMMIT > > > Now consider the case we are trying to solve > > drop table tt1; > create table tt1(f1 int); > > BEGIN; > insert into tt1 values(1); > declare c50 cursor for select * from tt1; -- should show one row only > insert into tt1 values(2); > fetch all from c50; > COMMIT; > > > Consider Data node 1 log > > (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read > committed READ WRITE] > (b) [exec_simple_query][1026][drop table tt1;] > (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] > (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] > (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read > committed READ WRITE] > (f) [exec_simple_query][1026][create table tt1(f1 int);] > (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] > (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] > (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read > committed READ WRITE] > (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] > (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] > (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM tt1] > (m) [exec_simple_query][1026][COMMIT TRANSACTION] > > The cursor currently shows both inserted rows because command id at data > node in > step (j) is 0 > step (k) is 1 & > step (l) is 2 > > Where as we need command ids to be > > step (j) should be 0 > step (k) should be 2 & > step (l) should be 1 > > This will solve the cursor visibility problem. > > To implement this I suggest we send command IDs to data nodes from the > coordinator > like we send gxid. The only difference will be that we do not need to take > command IDs > from GTM since they are only valid with in the transaction. > > See this example > > test=# select xmin,xmax,cmin,cmax,* from tt1; > xmin | xmax | cmin | cmax | f1 > ------+------+------+------+---- > (0 rows) > > test=# begin; > BEGIN > test=# insert into tt1 values(1); > INSERT 0 1 > test=# select xmin,xmax,cmin,cmax,* from tt1; > xmin | xmax | cmin | cmax | f1 > -------+------+------+------+---- > 80615 | 0 | 0 | 0 | 1 > (1 row) > > test=# insert into tt1 values(2); > INSERT 0 1 > test=# select xmin,xmax,cmin,cmax,* from tt1; > xmin | xmax | cmin | cmax | f1 > -------+------+------+------+---- > 80615 | 0 | 0 | 0 | 1 > 80615 | 0 | 1 | 1 | 2 > (2 rows) > > test=# insert into tt1 values(3); > INSERT 0 1 > test=# select xmin,xmax,cmin,cmax,* from tt1; > xmin | xmax | cmin | cmax | f1 > -------+------+------+------+---- > 80615 | 0 | 0 | 0 | 1 > 80615 | 0 | 1 | 1 | 2 > 80615 | 0 | 2 | 2 | 3 > (3 rows) > > test=# insert into tt1 values(4); > INSERT 0 1 > test=# select xmin,xmax,cmin,cmax,* from tt1; > xmin | xmax | cmin | cmax | f1 > -------+------+------+------+---- > 80615 | 0 | 0 | 0 | 1 > 80615 | 0 | 1 | 1 | 2 > 80615 | 0 | 2 | 2 | 3 > 80615 | 0 | 3 | 3 | 4 > (4 rows) > > test=# end; > COMMIT > test=# > test=# > test=# select xmin,xmax,cmin,cmax,* from tt1; > xmin | xmax | cmin | cmax | f1 > -------+------+------+------+---- > 80615 | 0 | 0 | 0 | 1 > 80615 | 0 | 1 | 1 | 2 > 80615 | 0 | 2 | 2 | 3 > 80615 | 0 | 3 | 3 | 4 > (4 rows) > > test=# insert into tt1 values(5); > INSERT 0 1 > test=# select xmin,xmax,cmin,cmax,* from tt1; > xmin | xmax | cmin | cmax | f1 > -------+------+------+------+---- > 80615 | 0 | 0 | 0 | 1 > 80615 | 0 | 1 | 1 | 2 > 80615 | 0 | 2 | 2 | 3 > 80615 | 0 | 3 | 3 | 4 > 80616 | 0 | 0 | 0 | 5 > (5 rows) > > test=# insert into tt1 values(6); > INSERT 0 1 > test=# > test=# > test=# select xmin,xmax,cmin,cmax,* from tt1; > xmin | xmax | cmin | cmax | f1 > -------+------+------+------+---- > 80615 | 0 | 0 | 0 | 1 > 80615 | 0 | 1 | 1 | 2 > 80615 | 0 | 2 | 2 | 3 > 80615 | 0 | 3 | 3 | 4 > 80616 | 0 | 0 | 0 | 5 > 80617 | 0 | 0 | 0 | 6 > (6 rows) > > Note that at the end of the multi-statement transaction the command id > gets reset to zero. > > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2012-06-19 05:26:23
|
Hi, I expect pgxc_node_send_cmd_id to have some impact on performance, so be sure to send it to remote Datanodes really only if necessary. You should put more severe conditions blocking this function cid can easily get incremented in Postgres. Regards, On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt <abb...@en...>wrote: > PFA a WIP patch implementing the design presented earlier. > The patch is WIP because it still has and FIXME and it shows some > regression failures that need to be fixed, but other than that it confirms > that the suggested design would work fine. The following test cases now > work fine > > drop table tt1; > create table tt1(f1 int) distribute by replication; > > > BEGIN; > insert into tt1 values(1); > declare c50 cursor for select * from tt1; > insert into tt1 values(2); > fetch all from c50; > COMMIT; > truncate table tt1; > > BEGIN; > > declare c50 cursor for select * from tt1; > insert into tt1 values(1); > > insert into tt1 values(2); > fetch all from c50; > COMMIT; > truncate table tt1; > > > BEGIN; > insert into tt1 values(1); > insert into tt1 values(2); > > declare c50 cursor for select * from tt1; > insert into tt1 values(3); > > fetch all from c50; > COMMIT; > truncate table tt1; > > > BEGIN; > insert into tt1 values(1); > declare c50 cursor for select * from tt1; > insert into tt1 values(2); > declare c51 cursor for select * from tt1; > insert into tt1 values(3); > fetch all from c50; > fetch all from c51; > COMMIT; > truncate table tt1; > > > BEGIN; > insert into tt1 values(1); > declare c50 cursor for select * from tt1; > declare c51 cursor for select * from tt1; > insert into tt1 values(2); > insert into tt1 values(3); > fetch all from c50; > fetch all from c51; > COMMIT; > truncate table tt1; > > > On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt <abb...@en...>wrote: > >> Hi, >> >> In a multi-statement transaction each statement is given a command >> identifier >> starting from zero and incrementing for each statement. >> These command indentifers are required for extra tracking because each >> statement has its own visibility rules with in the transaction. >> For example, a cursor’s contents must remain unchanged even if later >> statements in the >> same transaction modify rows. Such tracking is implemented using system >> command id >> columns cmin/cmax, which is internally actually is a single column. >> >> cmin/cmax come into play in case of multi-statement transactions only, >> they are both zero otherwise. >> >> cmin "The command identifier of the statement within the inserting >> transaction." >> cmax "The command identifier of the statement within the deleting >> transaction." >> >> Here are the visibility rules (taken from comments of tqual.c) >> >> ( // A heap tuple is valid "now" >> iff >> Xmin == my-transaction && // inserted by the current >> transaction >> Cmin < my-command && // before this command, and >> ( >> Xmax is null || // the row has not been deleted, >> or >> ( >> Xmax == my-transaction && // it was deleted by the current >> transaction >> Cmax >= my-command // but not before this command, >> ) >> ) >> ) >> || // or >> ( >> Xmin is committed && // the row was inserted by a >> committed transaction, and >> ( >> Xmax is null || // the row has not been deleted, >> or >> ( >> Xmax == my-transaction && // the row is being deleted by >> this transaction >> Cmax >= my-command) || // but it's not deleted "yet", or >> ( >> Xmax != my-transaction && // the row was deleted by >> another transaction >> Xmax is not committed // that has not been committed >> ) >> ) >> ) >> ) >> >> Because cmin and cmax are internally a single system column, >> it is therefore not possible to simply record the status of a row >> that is created and expired in the same multi-statement transaction. >> For that reason, a special combo command id is created that references >> a local memory hash that contains the actual cmin and cmax values. >> It means that if combo id is being used the number we are seeing >> would not be the cmin or cmax it will be an index into a local >> array that contains a structure with has the actual cmin and cmax values. >> >> The following queries (taken mostly from >> http://momjian.us/main/writings/pgsql/mvcc.pdf) >> use the contrib module pageinspect, which allows >> visibility of internal heap page structures and all stored rows, >> including those not visible in the current snapshot. >> (Bit 0x0020 is defined as HEAP_COMBOCID.) >> >> We are exploring 3 examples here: >> 1) INSERT & DELETE in a single transaction >> 2) INSERT & UPDATE in a single transaction >> 3) INSERT from two different transactions & UPDATE from one >> >> test=# drop table mvcc_demo; >> DROP TABLE >> test=# >> test=# create table mvcc_demo (val int); >> CREATE TABLE >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> test=# >> test=# BEGIN; >> BEGIN >> test=# DELETE FROM mvcc_demo; -- increment command id to show that combo >> id would be different >> DELETE 0 >> test=# DELETE FROM mvcc_demo; >> DELETE 0 >> test=# DELETE FROM mvcc_demo; >> DELETE 0 >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80685 | 0 | 3 | f >> 80685 | 0 | 4 | f >> 80685 | 0 | 5 | f >> (3 rows) >> >> test=# >> test=# DELETE FROM mvcc_demo; >> DELETE 3 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80685 | 80685 | 0 | t >> 80685 | 80685 | 1 | t >> 80685 | 80685 | 2 | t >> (3 rows) >> >> Note that since is_combocid is true the numbers are not cmin/cmax they >> are actually >> the indexes of the internal array already explained above. >> combo id index 0 would contain cmin 3, cmax 6 >> combo id index 1 would contain cmin 4, cmax 6 >> combo id index 2 would contain cmin 5, cmax 6 >> >> test=# >> test=# END; >> COMMIT >> test=# >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> test=# >> test=# >> test=# >> test=# BEGIN; >> BEGIN >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80675 | 0 | 0 | f >> 80675 | 0 | 1 | f >> 80675 | 0 | 2 | f >> (3 rows) >> >> test=# >> test=# UPDATE mvcc_demo SET val = val * 10; >> UPDATE 3 >> test=# >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80675 | 80675 | 0 | t >> 80675 | 80675 | 1 | t >> 80675 | 80675 | 2 | t >> 80675 | 0 | 3 | f >> 80675 | 0 | 3 | f >> 80675 | 0 | 3 | f >> (6 rows) >> >> test=# >> test=# END; >> COMMIT >> test=# >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> test=# >> >> -- From one psql issue >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80677 | 0 | 0 | f >> (1 row) >> >> >> test=# -- From another issue >> test=# BEGIN; >> BEGIN >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (4); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80677 | 0 | 0 | f >> 80678 | 0 | 0 | f >> 80678 | 0 | 1 | f >> 80678 | 0 | 2 | f >> (4 rows) >> >> test=# >> test=# UPDATE mvcc_demo SET val = val * 10; >> UPDATE 4 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80678 | 80678 | 0 | t >> 80678 | 80678 | 1 | t >> 80678 | 80678 | 2 | t >> 80677 | 80678 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> (8 rows) >> >> test=# >> >> test=# -- Before finishing this, issue these from the first psql >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80678 | 80678 | 0 | t >> 80678 | 80678 | 1 | t >> 80678 | 80678 | 2 | t >> 80677 | 80678 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> (8 rows) >> >> test=# END; >> COMMIT >> >> >> Now consider the case we are trying to solve >> >> drop table tt1; >> create table tt1(f1 int); >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; -- should show one row only >> insert into tt1 values(2); >> fetch all from c50; >> COMMIT; >> >> >> Consider Data node 1 log >> >> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >> committed READ WRITE] >> (b) [exec_simple_query][1026][drop table tt1;] >> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >> committed READ WRITE] >> (f) [exec_simple_query][1026][create table tt1(f1 int);] >> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >> committed READ WRITE] >> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM >> tt1] >> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >> >> The cursor currently shows both inserted rows because command id at data >> node in >> step (j) is 0 >> step (k) is 1 & >> step (l) is 2 >> >> Where as we need command ids to be >> >> step (j) should be 0 >> step (k) should be 2 & >> step (l) should be 1 >> >> This will solve the cursor visibility problem. >> >> To implement this I suggest we send command IDs to data nodes from the >> coordinator >> like we send gxid. The only difference will be that we do not need to >> take command IDs >> from GTM since they are only valid with in the transaction. >> >> See this example >> >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> ------+------+------+------+---- >> (0 rows) >> >> test=# begin; >> BEGIN >> test=# insert into tt1 values(1); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> (1 row) >> >> test=# insert into tt1 values(2); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> (2 rows) >> >> test=# insert into tt1 values(3); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> (3 rows) >> >> test=# insert into tt1 values(4); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> 80615 | 0 | 3 | 3 | 4 >> (4 rows) >> >> test=# end; >> COMMIT >> test=# >> test=# >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> 80615 | 0 | 3 | 3 | 4 >> (4 rows) >> >> test=# insert into tt1 values(5); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> 80615 | 0 | 3 | 3 | 4 >> 80616 | 0 | 0 | 0 | 5 >> (5 rows) >> >> test=# insert into tt1 values(6); >> INSERT 0 1 >> test=# >> test=# >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> 80615 | 0 | 3 | 3 | 4 >> 80616 | 0 | 0 | 0 | 5 >> 80617 | 0 | 0 | 0 | 6 >> (6 rows) >> >> Note that at the end of the multi-statement transaction the command id >> gets reset to zero. >> >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company > > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@en...> - 2012-06-19 08:26:16
|
Hi Ashutosh, Here are the results with the val column, Thanks. test=# drop table mvcc_demo; DROP TABLE test=# test=# create table mvcc_demo (val int); CREATE TABLE test=# test=# TRUNCATE mvcc_demo; TRUNCATE TABLE test=# test=# BEGIN; BEGIN test=# DELETE FROM mvcc_demo; -- increment command id to show that combo id would be different DELETE 0 test=# DELETE FROM mvcc_demo; DELETE 0 test=# DELETE FROM mvcc_demo; DELETE 0 test=# INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80689 | 0 | 3 | f 80689 | 0 | 4 | f 80689 | 0 | 5 | f (3 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80689 | 0 | 3 | 3 | 1 80689 | 0 | 4 | 4 | 2 80689 | 0 | 5 | 5 | 3 (3 rows) test=# test=# DELETE FROM mvcc_demo; DELETE 3 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80689 | 80689 | 0 | t 80689 | 80689 | 1 | t 80689 | 80689 | 2 | t (3 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val ------+------+------+------+----- (0 rows) test=# test=# END; COMMIT test=# test=# test=# TRUNCATE mvcc_demo; TRUNCATE TABLE test=# BEGIN; BEGIN test=# INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80693 | 0 | 0 | f 80693 | 0 | 1 | f 80693 | 0 | 2 | f (3 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80693 | 0 | 0 | 0 | 1 80693 | 0 | 1 | 1 | 2 80693 | 0 | 2 | 2 | 3 (3 rows) test=# test=# UPDATE mvcc_demo SET val = 10; UPDATE 3 test=# test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80693 | 80693 | 0 | t 80693 | 80693 | 1 | t 80693 | 80693 | 2 | t 80693 | 0 | 3 | f 80693 | 0 | 3 | f 80693 | 0 | 3 | f (6 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80693 | 0 | 3 | 3 | 10 80693 | 0 | 3 | 3 | 10 80693 | 0 | 3 | 3 | 10 (3 rows) test=# test=# END; COMMIT test=# test=# TRUNCATE mvcc_demo; TRUNCATE TABLE -- From one psql issue test=# INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80699 | 0 | 0 | f (1 row) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80699 | 0 | 0 | 0 | 1 (1 row) test=# -- From another issue test=# BEGIN; BEGIN test=# INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (4); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80699 | 0 | 0 | f 80700 | 0 | 0 | f 80700 | 0 | 1 | f 80700 | 0 | 2 | f (4 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80699 | 0 | 0 | 0 | 1 80700 | 0 | 0 | 0 | 2 80700 | 0 | 1 | 1 | 3 80700 | 0 | 2 | 2 | 4 (4 rows) test=# test=# UPDATE mvcc_demo SET val = 10; UPDATE 4 test=# test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80700 | 80700 | 0 | t 80700 | 80700 | 1 | t 80700 | 80700 | 2 | t 80699 | 80700 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f (8 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80700 | 0 | 3 | 3 | 10 80700 | 0 | 3 | 3 | 10 80700 | 0 | 3 | 3 | 10 80700 | 0 | 3 | 3 | 10 (4 rows) test=# -- Before finishing this, issue these from the first psql test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80700 | 80700 | 0 | t 80700 | 80700 | 1 | t 80700 | 80700 | 2 | t 80699 | 80700 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f (8 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+-------+------+------+----- 80699 | 80700 | 3 | 3 | 1 (1 row) test=# end; COMMIT On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier <mic...@gm... > wrote: > Hi, > > I expect pgxc_node_send_cmd_id to have some impact on performance, so be > sure to send it to remote Datanodes really only if necessary. > You should put more severe conditions blocking this function cid can > easily get incremented in Postgres. > > Regards, > > On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt <abb...@en...>wrote: > >> PFA a WIP patch implementing the design presented earlier. >> The patch is WIP because it still has and FIXME and it shows some >> regression failures that need to be fixed, but other than that it confirms >> that the suggested design would work fine. The following test cases now >> work fine >> >> drop table tt1; >> create table tt1(f1 int) distribute by replication; >> >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; >> insert into tt1 values(2); >> fetch all from c50; >> COMMIT; >> truncate table tt1; >> >> BEGIN; >> >> declare c50 cursor for select * from tt1; >> insert into tt1 values(1); >> >> insert into tt1 values(2); >> fetch all from c50; >> COMMIT; >> truncate table tt1; >> >> >> BEGIN; >> insert into tt1 values(1); >> insert into tt1 values(2); >> >> declare c50 cursor for select * from tt1; >> insert into tt1 values(3); >> >> fetch all from c50; >> COMMIT; >> truncate table tt1; >> >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; >> insert into tt1 values(2); >> declare c51 cursor for select * from tt1; >> insert into tt1 values(3); >> fetch all from c50; >> fetch all from c51; >> COMMIT; >> truncate table tt1; >> >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; >> declare c51 cursor for select * from tt1; >> insert into tt1 values(2); >> insert into tt1 values(3); >> fetch all from c50; >> fetch all from c51; >> COMMIT; >> truncate table tt1; >> >> >> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt <abb...@en...>wrote: >> >>> Hi, >>> >>> In a multi-statement transaction each statement is given a command >>> identifier >>> starting from zero and incrementing for each statement. >>> These command indentifers are required for extra tracking because each >>> statement has its own visibility rules with in the transaction. >>> For example, a cursor’s contents must remain unchanged even if later >>> statements in the >>> same transaction modify rows. Such tracking is implemented using system >>> command id >>> columns cmin/cmax, which is internally actually is a single column. >>> >>> cmin/cmax come into play in case of multi-statement transactions only, >>> they are both zero otherwise. >>> >>> cmin "The command identifier of the statement within the inserting >>> transaction." >>> cmax "The command identifier of the statement within the deleting >>> transaction." >>> >>> Here are the visibility rules (taken from comments of tqual.c) >>> >>> ( // A heap tuple is valid "now" >>> iff >>> Xmin == my-transaction && // inserted by the current >>> transaction >>> Cmin < my-command && // before this command, and >>> ( >>> Xmax is null || // the row has not been >>> deleted, or >>> ( >>> Xmax == my-transaction && // it was deleted by the >>> current transaction >>> Cmax >= my-command // but not before this command, >>> ) >>> ) >>> ) >>> || // or >>> ( >>> Xmin is committed && // the row was inserted by a >>> committed transaction, and >>> ( >>> Xmax is null || // the row has not been >>> deleted, or >>> ( >>> Xmax == my-transaction && // the row is being deleted by >>> this transaction >>> Cmax >= my-command) || // but it's not deleted "yet", >>> or >>> ( >>> Xmax != my-transaction && // the row was deleted by >>> another transaction >>> Xmax is not committed // that has not been committed >>> ) >>> ) >>> ) >>> ) >>> >>> Because cmin and cmax are internally a single system column, >>> it is therefore not possible to simply record the status of a row >>> that is created and expired in the same multi-statement transaction. >>> For that reason, a special combo command id is created that references >>> a local memory hash that contains the actual cmin and cmax values. >>> It means that if combo id is being used the number we are seeing >>> would not be the cmin or cmax it will be an index into a local >>> array that contains a structure with has the actual cmin and cmax values. >>> >>> The following queries (taken mostly from >>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>> use the contrib module pageinspect, which allows >>> visibility of internal heap page structures and all stored rows, >>> including those not visible in the current snapshot. >>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>> >>> We are exploring 3 examples here: >>> 1) INSERT & DELETE in a single transaction >>> 2) INSERT & UPDATE in a single transaction >>> 3) INSERT from two different transactions & UPDATE from one >>> >>> test=# drop table mvcc_demo; >>> DROP TABLE >>> test=# >>> test=# create table mvcc_demo (val int); >>> CREATE TABLE >>> test=# >>> test=# TRUNCATE mvcc_demo; >>> TRUNCATE TABLE >>> test=# >>> test=# BEGIN; >>> BEGIN >>> test=# DELETE FROM mvcc_demo; -- increment command id to show that combo >>> id would be different >>> DELETE 0 >>> test=# DELETE FROM mvcc_demo; >>> DELETE 0 >>> test=# DELETE FROM mvcc_demo; >>> DELETE 0 >>> test=# INSERT INTO mvcc_demo VALUES (1); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (2); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (3); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80685 | 0 | 3 | f >>> 80685 | 0 | 4 | f >>> 80685 | 0 | 5 | f >>> (3 rows) >>> >>> test=# >>> test=# DELETE FROM mvcc_demo; >>> DELETE 3 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80685 | 80685 | 0 | t >>> 80685 | 80685 | 1 | t >>> 80685 | 80685 | 2 | t >>> (3 rows) >>> >>> Note that since is_combocid is true the numbers are not cmin/cmax they >>> are actually >>> the indexes of the internal array already explained above. >>> combo id index 0 would contain cmin 3, cmax 6 >>> combo id index 1 would contain cmin 4, cmax 6 >>> combo id index 2 would contain cmin 5, cmax 6 >>> >>> test=# >>> test=# END; >>> COMMIT >>> test=# >>> test=# >>> test=# TRUNCATE mvcc_demo; >>> TRUNCATE TABLE >>> test=# >>> test=# >>> test=# >>> test=# BEGIN; >>> BEGIN >>> test=# INSERT INTO mvcc_demo VALUES (1); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (2); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (3); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80675 | 0 | 0 | f >>> 80675 | 0 | 1 | f >>> 80675 | 0 | 2 | f >>> (3 rows) >>> >>> test=# >>> test=# UPDATE mvcc_demo SET val = val * 10; >>> UPDATE 3 >>> test=# >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80675 | 80675 | 0 | t >>> 80675 | 80675 | 1 | t >>> 80675 | 80675 | 2 | t >>> 80675 | 0 | 3 | f >>> 80675 | 0 | 3 | f >>> 80675 | 0 | 3 | f >>> (6 rows) >>> >>> test=# >>> test=# END; >>> COMMIT >>> test=# >>> test=# >>> test=# TRUNCATE mvcc_demo; >>> TRUNCATE TABLE >>> test=# >>> >>> -- From one psql issue >>> test=# INSERT INTO mvcc_demo VALUES (1); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80677 | 0 | 0 | f >>> (1 row) >>> >>> >>> test=# -- From another issue >>> test=# BEGIN; >>> BEGIN >>> test=# INSERT INTO mvcc_demo VALUES (2); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (3); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (4); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80677 | 0 | 0 | f >>> 80678 | 0 | 0 | f >>> 80678 | 0 | 1 | f >>> 80678 | 0 | 2 | f >>> (4 rows) >>> >>> test=# >>> test=# UPDATE mvcc_demo SET val = val * 10; >>> UPDATE 4 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80678 | 80678 | 0 | t >>> 80678 | 80678 | 1 | t >>> 80678 | 80678 | 2 | t >>> 80677 | 80678 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> (8 rows) >>> >>> test=# >>> >>> test=# -- Before finishing this, issue these from the first psql >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80678 | 80678 | 0 | t >>> 80678 | 80678 | 1 | t >>> 80678 | 80678 | 2 | t >>> 80677 | 80678 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> (8 rows) >>> >>> test=# END; >>> COMMIT >>> >>> >>> Now consider the case we are trying to solve >>> >>> drop table tt1; >>> create table tt1(f1 int); >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; -- should show one row only >>> insert into tt1 values(2); >>> fetch all from c50; >>> COMMIT; >>> >>> >>> Consider Data node 1 log >>> >>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>> committed READ WRITE] >>> (b) [exec_simple_query][1026][drop table tt1;] >>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>> committed READ WRITE] >>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>> committed READ WRITE] >>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM >>> tt1] >>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>> >>> The cursor currently shows both inserted rows because command id at data >>> node in >>> step (j) is 0 >>> step (k) is 1 & >>> step (l) is 2 >>> >>> Where as we need command ids to be >>> >>> step (j) should be 0 >>> step (k) should be 2 & >>> step (l) should be 1 >>> >>> This will solve the cursor visibility problem. >>> >>> To implement this I suggest we send command IDs to data nodes from the >>> coordinator >>> like we send gxid. The only difference will be that we do not need to >>> take command IDs >>> from GTM since they are only valid with in the transaction. >>> >>> See this example >>> >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> ------+------+------+------+---- >>> (0 rows) >>> >>> test=# begin; >>> BEGIN >>> test=# insert into tt1 values(1); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> (1 row) >>> >>> test=# insert into tt1 values(2); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> (2 rows) >>> >>> test=# insert into tt1 values(3); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> (3 rows) >>> >>> test=# insert into tt1 values(4); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> 80615 | 0 | 3 | 3 | 4 >>> (4 rows) >>> >>> test=# end; >>> COMMIT >>> test=# >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> 80615 | 0 | 3 | 3 | 4 >>> (4 rows) >>> >>> test=# insert into tt1 values(5); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> 80615 | 0 | 3 | 3 | 4 >>> 80616 | 0 | 0 | 0 | 5 >>> (5 rows) >>> >>> test=# insert into tt1 values(6); >>> INSERT 0 1 >>> test=# >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> 80615 | 0 | 3 | 3 | 4 >>> 80616 | 0 | 0 | 0 | 5 >>> 80617 | 0 | 0 | 0 | 6 >>> (6 rows) >>> >>> Note that at the end of the multi-statement transaction the command id >>> gets reset to zero. >>> >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Michael Paquier > http://michael.otacoo.com > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Ashutosh B. <ash...@en...> - 2012-06-19 08:54:47
|
Hi Abbas, I have few comments to make 1. With this patch there are two variables for having command Id, that is going to cause confusion and will be a maintenance burden, might be error prone. Is it possible to use a single variable instead of two? Right now there is some code which is specific to cursors in your patch. If you can plug the coordinator command id somehow into currentCommandId, you won't need that code and any other code which needs coordinator command ID will be automatically taken care of. 2. A non-transaction on coordinator can spawn tranasactions on datanode or subtransactions (if there is already a transaction running). Does your patch handle that case? Should we do more thorough research in the transaction management, esp. to see the impact of getting same command id for two commands on the datanode? On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt <abb...@en...>wrote: > Hi Ashutosh, > Here are the results with the val column, Thanks. > > test=# drop table mvcc_demo; > DROP TABLE > test=# > test=# create table mvcc_demo (val int); > CREATE TABLE > test=# > test=# TRUNCATE mvcc_demo; > TRUNCATE TABLE > test=# > test=# BEGIN; > BEGIN > test=# DELETE FROM mvcc_demo; -- increment command id to show that combo > id would be different > DELETE 0 > test=# DELETE FROM mvcc_demo; > DELETE 0 > test=# DELETE FROM mvcc_demo; > DELETE 0 > test=# INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80689 | 0 | 3 | f > 80689 | 0 | 4 | f > 80689 | 0 | 5 | f > (3 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80689 | 0 | 3 | 3 | 1 > 80689 | 0 | 4 | 4 | 2 > 80689 | 0 | 5 | 5 | 3 > > (3 rows) > > test=# > test=# DELETE FROM mvcc_demo; > DELETE 3 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80689 | 80689 | 0 | t > 80689 | 80689 | 1 | t > 80689 | 80689 | 2 | t > (3 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > ------+------+------+------+----- > (0 rows) > > > test=# > test=# END; > COMMIT > test=# > test=# > test=# TRUNCATE mvcc_demo; > TRUNCATE TABLE > > > > > > > > > > > test=# BEGIN; > BEGIN > test=# INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80693 | 0 | 0 | f > 80693 | 0 | 1 | f > 80693 | 0 | 2 | f > (3 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80693 | 0 | 0 | 0 | 1 > 80693 | 0 | 1 | 1 | 2 > 80693 | 0 | 2 | 2 | 3 > (3 rows) > > test=# > test=# UPDATE mvcc_demo SET val = 10; > > UPDATE 3 > test=# > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80693 | 80693 | 0 | t > 80693 | 80693 | 1 | t > 80693 | 80693 | 2 | t > 80693 | 0 | 3 | f > 80693 | 0 | 3 | f > 80693 | 0 | 3 | f > (6 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80693 | 0 | 3 | 3 | 10 > 80693 | 0 | 3 | 3 | 10 > 80693 | 0 | 3 | 3 | 10 > (3 rows) > > > test=# > test=# END; > COMMIT > test=# > test=# TRUNCATE mvcc_demo; > TRUNCATE TABLE > > > > > > > > > > > > -- From one psql issue > test=# INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS > is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80699 | 0 | 0 | f > (1 row) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80699 | 0 | 0 | 0 | 1 > (1 row) > > > > > > test=# -- From another issue > test=# BEGIN; > BEGIN > test=# INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (4); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80699 | 0 | 0 | f > 80700 | 0 | 0 | f > 80700 | 0 | 1 | f > 80700 | 0 | 2 | f > (4 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80699 | 0 | 0 | 0 | 1 > 80700 | 0 | 0 | 0 | 2 > 80700 | 0 | 1 | 1 | 3 > 80700 | 0 | 2 | 2 | 4 > (4 rows) > > test=# > test=# UPDATE mvcc_demo SET val = 10; > > UPDATE 4 > test=# > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80700 | 80700 | 0 | t > 80700 | 80700 | 1 | t > 80700 | 80700 | 2 | t > 80699 | 80700 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > (8 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80700 | 0 | 3 | 3 | 10 > 80700 | 0 | 3 | 3 | 10 > 80700 | 0 | 3 | 3 | 10 > 80700 | 0 | 3 | 3 | 10 > (4 rows) > > > > > test=# -- Before finishing this, issue these from the first psql > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS > is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80700 | 80700 | 0 | t > 80700 | 80700 | 1 | t > 80700 | 80700 | 2 | t > 80699 | 80700 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > (8 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+-------+------+------+----- > 80699 | 80700 | 3 | 3 | 1 > (1 row) > > test=# end; > COMMIT > > > On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier < > mic...@gm...> wrote: > >> Hi, >> >> I expect pgxc_node_send_cmd_id to have some impact on performance, so be >> sure to send it to remote Datanodes really only if necessary. >> You should put more severe conditions blocking this function cid can >> easily get incremented in Postgres. >> >> Regards, >> >> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt <abb...@en...>wrote: >> >>> PFA a WIP patch implementing the design presented earlier. >>> The patch is WIP because it still has and FIXME and it shows some >>> regression failures that need to be fixed, but other than that it confirms >>> that the suggested design would work fine. The following test cases now >>> work fine >>> >>> drop table tt1; >>> create table tt1(f1 int) distribute by replication; >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(2); >>> fetch all from c50; >>> COMMIT; >>> truncate table tt1; >>> >>> BEGIN; >>> >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(1); >>> >>> insert into tt1 values(2); >>> fetch all from c50; >>> COMMIT; >>> truncate table tt1; >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> insert into tt1 values(2); >>> >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(3); >>> >>> fetch all from c50; >>> COMMIT; >>> truncate table tt1; >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(2); >>> declare c51 cursor for select * from tt1; >>> insert into tt1 values(3); >>> fetch all from c50; >>> fetch all from c51; >>> COMMIT; >>> truncate table tt1; >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; >>> declare c51 cursor for select * from tt1; >>> insert into tt1 values(2); >>> insert into tt1 values(3); >>> fetch all from c50; >>> fetch all from c51; >>> COMMIT; >>> truncate table tt1; >>> >>> >>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> Hi, >>>> >>>> In a multi-statement transaction each statement is given a command >>>> identifier >>>> starting from zero and incrementing for each statement. >>>> These command indentifers are required for extra tracking because each >>>> statement has its own visibility rules with in the transaction. >>>> For example, a cursor’s contents must remain unchanged even if later >>>> statements in the >>>> same transaction modify rows. Such tracking is implemented using >>>> system command id >>>> columns cmin/cmax, which is internally actually is a single column. >>>> >>>> cmin/cmax come into play in case of multi-statement transactions only, >>>> they are both zero otherwise. >>>> >>>> cmin "The command identifier of the statement within the inserting >>>> transaction." >>>> cmax "The command identifier of the statement within the deleting >>>> transaction." >>>> >>>> Here are the visibility rules (taken from comments of tqual.c) >>>> >>>> ( // A heap tuple is valid "now" >>>> iff >>>> Xmin == my-transaction && // inserted by the current >>>> transaction >>>> Cmin < my-command && // before this command, and >>>> ( >>>> Xmax is null || // the row has not been >>>> deleted, or >>>> ( >>>> Xmax == my-transaction && // it was deleted by the >>>> current transaction >>>> Cmax >= my-command // but not before this command, >>>> ) >>>> ) >>>> ) >>>> || // or >>>> ( >>>> Xmin is committed && // the row was inserted by a >>>> committed transaction, and >>>> ( >>>> Xmax is null || // the row has not been >>>> deleted, or >>>> ( >>>> Xmax == my-transaction && // the row is being deleted by >>>> this transaction >>>> Cmax >= my-command) || // but it's not deleted "yet", >>>> or >>>> ( >>>> Xmax != my-transaction && // the row was deleted by >>>> another transaction >>>> Xmax is not committed // that has not been committed >>>> ) >>>> ) >>>> ) >>>> ) >>>> >>>> Because cmin and cmax are internally a single system column, >>>> it is therefore not possible to simply record the status of a row >>>> that is created and expired in the same multi-statement transaction. >>>> For that reason, a special combo command id is created that references >>>> a local memory hash that contains the actual cmin and cmax values. >>>> It means that if combo id is being used the number we are seeing >>>> would not be the cmin or cmax it will be an index into a local >>>> array that contains a structure with has the actual cmin and cmax >>>> values. >>>> >>>> The following queries (taken mostly from >>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>>> use the contrib module pageinspect, which allows >>>> visibility of internal heap page structures and all stored rows, >>>> including those not visible in the current snapshot. >>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>>> >>>> We are exploring 3 examples here: >>>> 1) INSERT & DELETE in a single transaction >>>> 2) INSERT & UPDATE in a single transaction >>>> 3) INSERT from two different transactions & UPDATE from one >>>> >>>> test=# drop table mvcc_demo; >>>> DROP TABLE >>>> test=# >>>> test=# create table mvcc_demo (val int); >>>> CREATE TABLE >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> test=# >>>> test=# BEGIN; >>>> BEGIN >>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>> combo id would be different >>>> DELETE 0 >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 0 >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 0 >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80685 | 0 | 3 | f >>>> 80685 | 0 | 4 | f >>>> 80685 | 0 | 5 | f >>>> (3 rows) >>>> >>>> test=# >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 3 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80685 | 80685 | 0 | t >>>> 80685 | 80685 | 1 | t >>>> 80685 | 80685 | 2 | t >>>> (3 rows) >>>> >>>> Note that since is_combocid is true the numbers are not cmin/cmax they >>>> are actually >>>> the indexes of the internal array already explained above. >>>> combo id index 0 would contain cmin 3, cmax 6 >>>> combo id index 1 would contain cmin 4, cmax 6 >>>> combo id index 2 would contain cmin 5, cmax 6 >>>> >>>> test=# >>>> test=# END; >>>> COMMIT >>>> test=# >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> test=# >>>> test=# >>>> test=# >>>> test=# BEGIN; >>>> BEGIN >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80675 | 0 | 0 | f >>>> 80675 | 0 | 1 | f >>>> 80675 | 0 | 2 | f >>>> (3 rows) >>>> >>>> test=# >>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>> UPDATE 3 >>>> test=# >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80675 | 80675 | 0 | t >>>> 80675 | 80675 | 1 | t >>>> 80675 | 80675 | 2 | t >>>> 80675 | 0 | 3 | f >>>> 80675 | 0 | 3 | f >>>> 80675 | 0 | 3 | f >>>> (6 rows) >>>> >>>> test=# >>>> test=# END; >>>> COMMIT >>>> test=# >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> test=# >>>> >>>> -- From one psql issue >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80677 | 0 | 0 | f >>>> (1 row) >>>> >>>> >>>> test=# -- From another issue >>>> test=# BEGIN; >>>> BEGIN >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80677 | 0 | 0 | f >>>> 80678 | 0 | 0 | f >>>> 80678 | 0 | 1 | f >>>> 80678 | 0 | 2 | f >>>> (4 rows) >>>> >>>> test=# >>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>> UPDATE 4 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80678 | 80678 | 0 | t >>>> 80678 | 80678 | 1 | t >>>> 80678 | 80678 | 2 | t >>>> 80677 | 80678 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> (8 rows) >>>> >>>> test=# >>>> >>>> test=# -- Before finishing this, issue these from the first psql >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80678 | 80678 | 0 | t >>>> 80678 | 80678 | 1 | t >>>> 80678 | 80678 | 2 | t >>>> 80677 | 80678 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> (8 rows) >>>> >>>> test=# END; >>>> COMMIT >>>> >>>> >>>> Now consider the case we are trying to solve >>>> >>>> drop table tt1; >>>> create table tt1(f1 int); >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> declare c50 cursor for select * from tt1; -- should show one row only >>>> insert into tt1 values(2); >>>> fetch all from c50; >>>> COMMIT; >>>> >>>> >>>> Consider Data node 1 log >>>> >>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>> committed READ WRITE] >>>> (b) [exec_simple_query][1026][drop table tt1;] >>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>> committed READ WRITE] >>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>> committed READ WRITE] >>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM >>>> tt1] >>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>>> >>>> The cursor currently shows both inserted rows because command id at >>>> data node in >>>> step (j) is 0 >>>> step (k) is 1 & >>>> step (l) is 2 >>>> >>>> Where as we need command ids to be >>>> >>>> step (j) should be 0 >>>> step (k) should be 2 & >>>> step (l) should be 1 >>>> >>>> This will solve the cursor visibility problem. >>>> >>>> To implement this I suggest we send command IDs to data nodes from the >>>> coordinator >>>> like we send gxid. The only difference will be that we do not need to >>>> take command IDs >>>> from GTM since they are only valid with in the transaction. >>>> >>>> See this example >>>> >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> ------+------+------+------+---- >>>> (0 rows) >>>> >>>> test=# begin; >>>> BEGIN >>>> test=# insert into tt1 values(1); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> (1 row) >>>> >>>> test=# insert into tt1 values(2); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> (2 rows) >>>> >>>> test=# insert into tt1 values(3); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> (3 rows) >>>> >>>> test=# insert into tt1 values(4); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> 80615 | 0 | 3 | 3 | 4 >>>> (4 rows) >>>> >>>> test=# end; >>>> COMMIT >>>> test=# >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> 80615 | 0 | 3 | 3 | 4 >>>> (4 rows) >>>> >>>> test=# insert into tt1 values(5); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> 80615 | 0 | 3 | 3 | 4 >>>> 80616 | 0 | 0 | 0 | 5 >>>> (5 rows) >>>> >>>> test=# insert into tt1 values(6); >>>> INSERT 0 1 >>>> test=# >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> 80615 | 0 | 3 | 3 | 4 >>>> 80616 | 0 | 0 | 0 | 5 >>>> 80617 | 0 | 0 | 0 | 6 >>>> (6 rows) >>>> >>>> Note that at the end of the multi-statement transaction the command id >>>> gets reset to zero. >>>> >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Abbas B. <abb...@en...> - 2012-06-19 09:43:41
|
Thanks for your comments. On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Abbas, > I have few comments to make > 1. With this patch there are two variables for having command Id, that is > going to cause confusion and will be a maintenance burden, might be error > prone. Is it possible to use a single variable instead of two? Are you talking about receivedCommandId and currentCommandId? If yes, I would prefer not having a packet received from coordinator overwrite the currentCommandId at data node, because I am not 100% sure about the life time of currentCommandId, I might overwrite it before time. It would be safe to let currentCommandId as is unless we are compelled to get the next command ID, and have the received command id take priority at that time. > Right now there is some code which is specific to cursors in your patch. > If you can plug the coordinator command id somehow into currentCommandId, > you won't need that code and any other code which needs coordinator command > ID will be automatically taken care of. > That code is required to solve a problem. Consider this case when a coordinator received this transaction BEGIN; insert into tt1 values(1); declare c50 cursor for select * from tt1; insert into tt1 values(2); fetch all from c50; COMMIT; While sending select to the data node in response to a fetch we need to know what was the command ID of the declare cursor statement and we need to send that command ID to the data node for this particular fetch. This is the main idea behind this solution. The first insert goes to the data node with command id 0, the second insert goes with 2. Command ID 1 is consumed by declare cursor. When coordinator sees fetch it needs to send select to the data node with command ID 1 rather than 3. > 2. A non-transaction on coordinator can spawn tranasactions on datanode or > subtransactions (if there is already a transaction running). Does your > patch handle that case? No and it does not need to, because that case has no known problems that we need to solve. I don't think my patch would impact any such case but I will analyze any failures that I may get in regressions. > Should we do more thorough research in the transaction management, esp. to > see the impact of getting same command id for two commands on the datanode? > If we issue two commands with the same command ID then we will definitely have visibility issues according to the rules I have already explained. But we will not have two commands sent to the data node with same command id. > > > On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt <abb...@en...>wrote: > >> Hi Ashutosh, >> Here are the results with the val column, Thanks. >> >> test=# drop table mvcc_demo; >> DROP TABLE >> test=# >> test=# create table mvcc_demo (val int); >> CREATE TABLE >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> test=# >> test=# BEGIN; >> BEGIN >> test=# DELETE FROM mvcc_demo; -- increment command id to show that combo >> id would be different >> DELETE 0 >> test=# DELETE FROM mvcc_demo; >> DELETE 0 >> test=# DELETE FROM mvcc_demo; >> DELETE 0 >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80689 | 0 | 3 | f >> 80689 | 0 | 4 | f >> 80689 | 0 | 5 | f >> (3 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80689 | 0 | 3 | 3 | 1 >> 80689 | 0 | 4 | 4 | 2 >> 80689 | 0 | 5 | 5 | 3 >> >> (3 rows) >> >> test=# >> test=# DELETE FROM mvcc_demo; >> DELETE 3 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80689 | 80689 | 0 | t >> 80689 | 80689 | 1 | t >> 80689 | 80689 | 2 | t >> (3 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> ------+------+------+------+----- >> (0 rows) >> >> >> test=# >> test=# END; >> COMMIT >> test=# >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> >> >> >> >> >> >> >> >> >> >> test=# BEGIN; >> BEGIN >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80693 | 0 | 0 | f >> 80693 | 0 | 1 | f >> 80693 | 0 | 2 | f >> (3 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80693 | 0 | 0 | 0 | 1 >> 80693 | 0 | 1 | 1 | 2 >> 80693 | 0 | 2 | 2 | 3 >> (3 rows) >> >> test=# >> test=# UPDATE mvcc_demo SET val = 10; >> >> UPDATE 3 >> test=# >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80693 | 80693 | 0 | t >> 80693 | 80693 | 1 | t >> 80693 | 80693 | 2 | t >> 80693 | 0 | 3 | f >> 80693 | 0 | 3 | f >> 80693 | 0 | 3 | f >> (6 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80693 | 0 | 3 | 3 | 10 >> 80693 | 0 | 3 | 3 | 10 >> 80693 | 0 | 3 | 3 | 10 >> (3 rows) >> >> >> test=# >> test=# END; >> COMMIT >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> >> >> >> >> >> >> >> >> >> >> >> -- From one psql issue >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80699 | 0 | 0 | f >> (1 row) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80699 | 0 | 0 | 0 | 1 >> (1 row) >> >> >> >> >> >> test=# -- From another issue >> test=# BEGIN; >> BEGIN >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (4); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80699 | 0 | 0 | f >> 80700 | 0 | 0 | f >> 80700 | 0 | 1 | f >> 80700 | 0 | 2 | f >> (4 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80699 | 0 | 0 | 0 | 1 >> 80700 | 0 | 0 | 0 | 2 >> 80700 | 0 | 1 | 1 | 3 >> 80700 | 0 | 2 | 2 | 4 >> (4 rows) >> >> test=# >> test=# UPDATE mvcc_demo SET val = 10; >> >> UPDATE 4 >> test=# >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80700 | 80700 | 0 | t >> 80700 | 80700 | 1 | t >> 80700 | 80700 | 2 | t >> 80699 | 80700 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> (8 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80700 | 0 | 3 | 3 | 10 >> 80700 | 0 | 3 | 3 | 10 >> 80700 | 0 | 3 | 3 | 10 >> 80700 | 0 | 3 | 3 | 10 >> (4 rows) >> >> >> >> >> test=# -- Before finishing this, issue these from the first psql >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80700 | 80700 | 0 | t >> 80700 | 80700 | 1 | t >> 80700 | 80700 | 2 | t >> 80699 | 80700 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> (8 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+-------+------+------+----- >> 80699 | 80700 | 3 | 3 | 1 >> (1 row) >> >> test=# end; >> COMMIT >> >> >> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi, >>> >>> I expect pgxc_node_send_cmd_id to have some impact on performance, so be >>> sure to send it to remote Datanodes really only if necessary. >>> You should put more severe conditions blocking this function cid can >>> easily get incremented in Postgres. >>> >>> Regards, >>> >>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> PFA a WIP patch implementing the design presented earlier. >>>> The patch is WIP because it still has and FIXME and it shows some >>>> regression failures that need to be fixed, but other than that it confirms >>>> that the suggested design would work fine. The following test cases now >>>> work fine >>>> >>>> drop table tt1; >>>> create table tt1(f1 int) distribute by replication; >>>> >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> declare c50 cursor for select * from tt1; >>>> insert into tt1 values(2); >>>> fetch all from c50; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> BEGIN; >>>> >>>> declare c50 cursor for select * from tt1; >>>> insert into tt1 values(1); >>>> >>>> insert into tt1 values(2); >>>> fetch all from c50; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> insert into tt1 values(2); >>>> >>>> declare c50 cursor for select * from tt1; >>>> insert into tt1 values(3); >>>> >>>> fetch all from c50; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> declare c50 cursor for select * from tt1; >>>> insert into tt1 values(2); >>>> declare c51 cursor for select * from tt1; >>>> insert into tt1 values(3); >>>> fetch all from c50; >>>> fetch all from c51; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> declare c50 cursor for select * from tt1; >>>> declare c51 cursor for select * from tt1; >>>> insert into tt1 values(2); >>>> insert into tt1 values(3); >>>> fetch all from c50; >>>> fetch all from c51; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> >>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt < >>>> abb...@en...> wrote: >>>> >>>>> Hi, >>>>> >>>>> In a multi-statement transaction each statement is given a command >>>>> identifier >>>>> starting from zero and incrementing for each statement. >>>>> These command indentifers are required for extra tracking because each >>>>> statement has its own visibility rules with in the transaction. >>>>> For example, a cursor’s contents must remain unchanged even if later >>>>> statements in the >>>>> same transaction modify rows. Such tracking is implemented using >>>>> system command id >>>>> columns cmin/cmax, which is internally actually is a single column. >>>>> >>>>> cmin/cmax come into play in case of multi-statement transactions only, >>>>> they are both zero otherwise. >>>>> >>>>> cmin "The command identifier of the statement within the inserting >>>>> transaction." >>>>> cmax "The command identifier of the statement within the deleting >>>>> transaction." >>>>> >>>>> Here are the visibility rules (taken from comments of tqual.c) >>>>> >>>>> ( // A heap tuple is valid >>>>> "now" iff >>>>> Xmin == my-transaction && // inserted by the current >>>>> transaction >>>>> Cmin < my-command && // before this command, and >>>>> ( >>>>> Xmax is null || // the row has not been >>>>> deleted, or >>>>> ( >>>>> Xmax == my-transaction && // it was deleted by the >>>>> current transaction >>>>> Cmax >= my-command // but not before this >>>>> command, >>>>> ) >>>>> ) >>>>> ) >>>>> || // or >>>>> ( >>>>> Xmin is committed && // the row was inserted by a >>>>> committed transaction, and >>>>> ( >>>>> Xmax is null || // the row has not been >>>>> deleted, or >>>>> ( >>>>> Xmax == my-transaction && // the row is being deleted >>>>> by this transaction >>>>> Cmax >= my-command) || // but it's not deleted >>>>> "yet", or >>>>> ( >>>>> Xmax != my-transaction && // the row was deleted by >>>>> another transaction >>>>> Xmax is not committed // that has not been committed >>>>> ) >>>>> ) >>>>> ) >>>>> ) >>>>> >>>>> Because cmin and cmax are internally a single system column, >>>>> it is therefore not possible to simply record the status of a row >>>>> that is created and expired in the same multi-statement transaction. >>>>> For that reason, a special combo command id is created that references >>>>> a local memory hash that contains the actual cmin and cmax values. >>>>> It means that if combo id is being used the number we are seeing >>>>> would not be the cmin or cmax it will be an index into a local >>>>> array that contains a structure with has the actual cmin and cmax >>>>> values. >>>>> >>>>> The following queries (taken mostly from >>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>>>> use the contrib module pageinspect, which allows >>>>> visibility of internal heap page structures and all stored rows, >>>>> including those not visible in the current snapshot. >>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>>>> >>>>> We are exploring 3 examples here: >>>>> 1) INSERT & DELETE in a single transaction >>>>> 2) INSERT & UPDATE in a single transaction >>>>> 3) INSERT from two different transactions & UPDATE from one >>>>> >>>>> test=# drop table mvcc_demo; >>>>> DROP TABLE >>>>> test=# >>>>> test=# create table mvcc_demo (val int); >>>>> CREATE TABLE >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> test=# >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>>> combo id would be different >>>>> DELETE 0 >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 0 >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 0 >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80685 | 0 | 3 | f >>>>> 80685 | 0 | 4 | f >>>>> 80685 | 0 | 5 | f >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 3 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80685 | 80685 | 0 | t >>>>> 80685 | 80685 | 1 | t >>>>> 80685 | 80685 | 2 | t >>>>> (3 rows) >>>>> >>>>> Note that since is_combocid is true the numbers are not cmin/cmax they >>>>> are actually >>>>> the indexes of the internal array already explained above. >>>>> combo id index 0 would contain cmin 3, cmax 6 >>>>> combo id index 1 would contain cmin 4, cmax 6 >>>>> combo id index 2 would contain cmin 5, cmax 6 >>>>> >>>>> test=# >>>>> test=# END; >>>>> COMMIT >>>>> test=# >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> test=# >>>>> test=# >>>>> test=# >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80675 | 0 | 0 | f >>>>> 80675 | 0 | 1 | f >>>>> 80675 | 0 | 2 | f >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>> UPDATE 3 >>>>> test=# >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80675 | 80675 | 0 | t >>>>> 80675 | 80675 | 1 | t >>>>> 80675 | 80675 | 2 | t >>>>> 80675 | 0 | 3 | f >>>>> 80675 | 0 | 3 | f >>>>> 80675 | 0 | 3 | f >>>>> (6 rows) >>>>> >>>>> test=# >>>>> test=# END; >>>>> COMMIT >>>>> test=# >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> test=# >>>>> >>>>> -- From one psql issue >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80677 | 0 | 0 | f >>>>> (1 row) >>>>> >>>>> >>>>> test=# -- From another issue >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80677 | 0 | 0 | f >>>>> 80678 | 0 | 0 | f >>>>> 80678 | 0 | 1 | f >>>>> 80678 | 0 | 2 | f >>>>> (4 rows) >>>>> >>>>> test=# >>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>> UPDATE 4 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80678 | 80678 | 0 | t >>>>> 80678 | 80678 | 1 | t >>>>> 80678 | 80678 | 2 | t >>>>> 80677 | 80678 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> (8 rows) >>>>> >>>>> test=# >>>>> >>>>> test=# -- Before finishing this, issue these from the first psql >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80678 | 80678 | 0 | t >>>>> 80678 | 80678 | 1 | t >>>>> 80678 | 80678 | 2 | t >>>>> 80677 | 80678 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> (8 rows) >>>>> >>>>> test=# END; >>>>> COMMIT >>>>> >>>>> >>>>> Now consider the case we are trying to solve >>>>> >>>>> drop table tt1; >>>>> create table tt1(f1 int); >>>>> >>>>> BEGIN; >>>>> insert into tt1 values(1); >>>>> declare c50 cursor for select * from tt1; -- should show one row only >>>>> insert into tt1 values(2); >>>>> fetch all from c50; >>>>> COMMIT; >>>>> >>>>> >>>>> Consider Data node 1 log >>>>> >>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>> committed READ WRITE] >>>>> (b) [exec_simple_query][1026][drop table tt1;] >>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>> committed READ WRITE] >>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>> committed READ WRITE] >>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM >>>>> tt1] >>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>>>> >>>>> The cursor currently shows both inserted rows because command id at >>>>> data node in >>>>> step (j) is 0 >>>>> step (k) is 1 & >>>>> step (l) is 2 >>>>> >>>>> Where as we need command ids to be >>>>> >>>>> step (j) should be 0 >>>>> step (k) should be 2 & >>>>> step (l) should be 1 >>>>> >>>>> This will solve the cursor visibility problem. >>>>> >>>>> To implement this I suggest we send command IDs to data nodes from the >>>>> coordinator >>>>> like we send gxid. The only difference will be that we do not need to >>>>> take command IDs >>>>> from GTM since they are only valid with in the transaction. >>>>> >>>>> See this example >>>>> >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> ------+------+------+------+---- >>>>> (0 rows) >>>>> >>>>> test=# begin; >>>>> BEGIN >>>>> test=# insert into tt1 values(1); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> (1 row) >>>>> >>>>> test=# insert into tt1 values(2); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> (2 rows) >>>>> >>>>> test=# insert into tt1 values(3); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> (3 rows) >>>>> >>>>> test=# insert into tt1 values(4); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> 80615 | 0 | 3 | 3 | 4 >>>>> (4 rows) >>>>> >>>>> test=# end; >>>>> COMMIT >>>>> test=# >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> 80615 | 0 | 3 | 3 | 4 >>>>> (4 rows) >>>>> >>>>> test=# insert into tt1 values(5); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> 80615 | 0 | 3 | 3 | 4 >>>>> 80616 | 0 | 0 | 0 | 5 >>>>> (5 rows) >>>>> >>>>> test=# insert into tt1 values(6); >>>>> INSERT 0 1 >>>>> test=# >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> 80615 | 0 | 3 | 3 | 4 >>>>> 80616 | 0 | 0 | 0 | 5 >>>>> 80617 | 0 | 0 | 0 | 6 >>>>> (6 rows) >>>>> >>>>> Note that at the end of the multi-statement transaction the command id >>>>> gets reset to zero. >>>>> >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2012-07-04 03:35:20
|
While fixing the regression failures resulting from the changes done by the patch I was able to fix all except this test case set enforce_two_phase_commit = off; CREATE TEMP TABLE users ( id INT PRIMARY KEY, name VARCHAR NOT NULL ) DISTRIBUTE BY REPLICATION; INSERT INTO users VALUES (1, 'Jozko'); INSERT INTO users VALUES (2, 'Ferko'); INSERT INTO users VALUES (3, 'Samko'); CREATE TEMP TABLE tasks ( id INT PRIMARY KEY, owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL ) DISTRIBUTE BY REPLICATION; INSERT INTO tasks VALUES (1,1,NULL,NULL); INSERT INTO tasks VALUES (2,2,2,NULL); INSERT INTO tasks VALUES (3,3,3,3); BEGIN; UPDATE tasks set id=id WHERE id=2; SELECT * FROM tasks; DELETE FROM users WHERE id = 2; SELECT * FROM tasks; COMMIT; The obtained output from the last select statement is id | owner | worker | checked_by ----+-------+--------+------------ 1 | 1 | | 3 | 3 | 3 | 3 2 | 2 | 2 | (3 rows) where as the expected output is id | owner | worker | checked_by ----+-------+--------+------------ 1 | 1 | | 3 | 3 | 3 | 3 2 | | | (3 rows) Note that the owner and worker have been set to null due to "ON DELETE SET NULL". Here is the reason why this does not work properly. Consider the last transaction BEGIN; UPDATE tasks set id=id WHERE id=2; SELECT * FROM tasks; DELETE FROM users WHERE id = 2; SELECT * FROM tasks; COMMIT; Here are the command id values the coordinator sends to the data node 0 for the first update that gets incremented to 1 because this is a DML and needs to consume a command id 1 for the first select that remains 1 since it is not required to be consumed. 1 for the delete statement that gets incremented to 2 because it is a DML and 2 for the last select. Now this is what happens on the data node When the data node receives the first update with command id 0, it increments it once due to the update itself and once due to the update run because of "ON UPDATE CASCADE". Hence the command id at the end of update on data node is 2. The first select comes to data node with command id 1, which is incorrect. The user's intention is to see data after update and its command id should be 2. Now delete comes with command id 1, and data node increments it once due to the delete itself and once due to the update run because of "ON DELETE SET NULL", hence the command id at the end of delete is 3. Coordinator now sends last select with command id 2, which is again incorrect since user's intention is to see data after delete and select should have been sent to data node with command id 3 or 4. Every time data node increments command id due to any statements run implicitly either because of the constraints or triggers, this scheme of sending command ids to data node from coordinator to solve fetch problems would fail. Datanode can have a trigger e.g. inserting rows thrice on every single insert and would increment command id on every insert. Therefore this design cannot work. Either we have to synchronize command ids between datanode and coordinator through GTM OR We will have to send the DECLARE CURSOR down to the datanode. In this case however we will not be able to send the cursor query as it is because the query might contain a join on two tables which exist on a disjoint set of data nodes. Comments or suggestions are welcome. On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt <abb...@en...>wrote: > Thanks for your comments. > > On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Hi Abbas, >> I have few comments to make >> 1. With this patch there are two variables for having command Id, that is >> going to cause confusion and will be a maintenance burden, might be error >> prone. Is it possible to use a single variable instead of two? > > > Are you talking about receivedCommandId and currentCommandId? If yes, I > would prefer not having a packet received from coordinator overwrite the > currentCommandId at data node, because I am not 100% sure about the life > time of currentCommandId, I might overwrite it before time. It would be > safe to let currentCommandId as is unless we are compelled to get the next > command ID, and have the received command id take priority at that time. > > >> Right now there is some code which is specific to cursors in your patch. >> If you can plug the coordinator command id somehow into currentCommandId, >> you won't need that code and any other code which needs coordinator command >> ID will be automatically taken care of. >> > > That code is required to solve a problem. Consider this case when a > coordinator received this transaction > > > BEGIN; > insert into tt1 values(1); > declare c50 cursor for select * from tt1; > insert into tt1 values(2); > fetch all from c50; > COMMIT; > > While sending select to the data node in response to a fetch we need to > know what was the command ID of the declare cursor statement and we need to > send that command ID to the data node for this particular fetch. This is > the main idea behind this solution. > > The first insert goes to the data node with command id 0, the second > insert goes with 2. Command ID 1 is consumed by declare cursor. When > coordinator sees fetch it needs to send select to the data node with > command ID 1 rather than 3. > > > >> 2. A non-transaction on coordinator can spawn tranasactions on datanode >> or subtransactions (if there is already a transaction running). Does your >> patch handle that case? > > > No and it does not need to, because that case has no known problems that > we need to solve. I don't think my patch would impact any such case but I > will analyze any failures that I may get in regressions. > > >> Should we do more thorough research in the transaction management, esp. >> to see the impact of getting same command id for two commands on the >> datanode? >> > > If we issue two commands with the same command ID then we will definitely > have visibility issues according to the rules I have already explained. But > we will not have two commands sent to the data node with same command id. > > >> >> >> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt <abb...@en...>wrote: >> >>> Hi Ashutosh, >>> Here are the results with the val column, Thanks. >>> >>> test=# drop table mvcc_demo; >>> DROP TABLE >>> test=# >>> test=# create table mvcc_demo (val int); >>> CREATE TABLE >>> test=# >>> test=# TRUNCATE mvcc_demo; >>> TRUNCATE TABLE >>> test=# >>> test=# BEGIN; >>> BEGIN >>> test=# DELETE FROM mvcc_demo; -- increment command id to show that combo >>> id would be different >>> DELETE 0 >>> test=# DELETE FROM mvcc_demo; >>> DELETE 0 >>> test=# DELETE FROM mvcc_demo; >>> DELETE 0 >>> test=# INSERT INTO mvcc_demo VALUES (1); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (2); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (3); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80689 | 0 | 3 | f >>> 80689 | 0 | 4 | f >>> 80689 | 0 | 5 | f >>> (3 rows) >>> >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> xmin | xmax | cmin | cmax | val >>> -------+------+------+------+----- >>> 80689 | 0 | 3 | 3 | 1 >>> 80689 | 0 | 4 | 4 | 2 >>> 80689 | 0 | 5 | 5 | 3 >>> >>> (3 rows) >>> >>> test=# >>> test=# DELETE FROM mvcc_demo; >>> DELETE 3 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80689 | 80689 | 0 | t >>> 80689 | 80689 | 1 | t >>> 80689 | 80689 | 2 | t >>> (3 rows) >>> >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> xmin | xmax | cmin | cmax | val >>> ------+------+------+------+----- >>> (0 rows) >>> >>> >>> test=# >>> test=# END; >>> COMMIT >>> test=# >>> test=# >>> test=# TRUNCATE mvcc_demo; >>> TRUNCATE TABLE >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> test=# BEGIN; >>> BEGIN >>> test=# INSERT INTO mvcc_demo VALUES (1); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (2); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (3); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80693 | 0 | 0 | f >>> 80693 | 0 | 1 | f >>> 80693 | 0 | 2 | f >>> (3 rows) >>> >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> xmin | xmax | cmin | cmax | val >>> -------+------+------+------+----- >>> 80693 | 0 | 0 | 0 | 1 >>> 80693 | 0 | 1 | 1 | 2 >>> 80693 | 0 | 2 | 2 | 3 >>> (3 rows) >>> >>> test=# >>> test=# UPDATE mvcc_demo SET val = 10; >>> >>> UPDATE 3 >>> test=# >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80693 | 80693 | 0 | t >>> 80693 | 80693 | 1 | t >>> 80693 | 80693 | 2 | t >>> 80693 | 0 | 3 | f >>> 80693 | 0 | 3 | f >>> 80693 | 0 | 3 | f >>> (6 rows) >>> >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> xmin | xmax | cmin | cmax | val >>> -------+------+------+------+----- >>> 80693 | 0 | 3 | 3 | 10 >>> 80693 | 0 | 3 | 3 | 10 >>> 80693 | 0 | 3 | 3 | 10 >>> (3 rows) >>> >>> >>> test=# >>> test=# END; >>> COMMIT >>> test=# >>> test=# TRUNCATE mvcc_demo; >>> TRUNCATE TABLE >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- From one psql issue >>> test=# INSERT INTO mvcc_demo VALUES (1); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80699 | 0 | 0 | f >>> (1 row) >>> >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> xmin | xmax | cmin | cmax | val >>> -------+------+------+------+----- >>> 80699 | 0 | 0 | 0 | 1 >>> (1 row) >>> >>> >>> >>> >>> >>> test=# -- From another issue >>> test=# BEGIN; >>> BEGIN >>> test=# INSERT INTO mvcc_demo VALUES (2); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (3); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (4); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80699 | 0 | 0 | f >>> 80700 | 0 | 0 | f >>> 80700 | 0 | 1 | f >>> 80700 | 0 | 2 | f >>> (4 rows) >>> >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> xmin | xmax | cmin | cmax | val >>> -------+------+------+------+----- >>> 80699 | 0 | 0 | 0 | 1 >>> 80700 | 0 | 0 | 0 | 2 >>> 80700 | 0 | 1 | 1 | 3 >>> 80700 | 0 | 2 | 2 | 4 >>> (4 rows) >>> >>> test=# >>> test=# UPDATE mvcc_demo SET val = 10; >>> >>> UPDATE 4 >>> test=# >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80700 | 80700 | 0 | t >>> 80700 | 80700 | 1 | t >>> 80700 | 80700 | 2 | t >>> 80699 | 80700 | 3 | f >>> 80700 | 0 | 3 | f >>> 80700 | 0 | 3 | f >>> 80700 | 0 | 3 | f >>> 80700 | 0 | 3 | f >>> (8 rows) >>> >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> xmin | xmax | cmin | cmax | val >>> -------+------+------+------+----- >>> 80700 | 0 | 3 | 3 | 10 >>> 80700 | 0 | 3 | 3 | 10 >>> 80700 | 0 | 3 | 3 | 10 >>> 80700 | 0 | 3 | 3 | 10 >>> (4 rows) >>> >>> >>> >>> >>> test=# -- Before finishing this, issue these from the first psql >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80700 | 80700 | 0 | t >>> 80700 | 80700 | 1 | t >>> 80700 | 80700 | 2 | t >>> 80699 | 80700 | 3 | f >>> 80700 | 0 | 3 | f >>> 80700 | 0 | 3 | f >>> 80700 | 0 | 3 | f >>> 80700 | 0 | 3 | f >>> (8 rows) >>> >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> xmin | xmax | cmin | cmax | val >>> -------+-------+------+------+----- >>> 80699 | 80700 | 3 | 3 | 1 >>> (1 row) >>> >>> test=# end; >>> COMMIT >>> >>> >>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> Hi, >>>> >>>> I expect pgxc_node_send_cmd_id to have some impact on performance, so >>>> be sure to send it to remote Datanodes really only if necessary. >>>> You should put more severe conditions blocking this function cid can >>>> easily get incremented in Postgres. >>>> >>>> Regards, >>>> >>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt < >>>> abb...@en...> wrote: >>>> >>>>> PFA a WIP patch implementing the design presented earlier. >>>>> The patch is WIP because it still has and FIXME and it shows some >>>>> regression failures that need to be fixed, but other than that it confirms >>>>> that the suggested design would work fine. The following test cases now >>>>> work fine >>>>> >>>>> drop table tt1; >>>>> create table tt1(f1 int) distribute by replication; >>>>> >>>>> >>>>> BEGIN; >>>>> insert into tt1 values(1); >>>>> declare c50 cursor for select * from tt1; >>>>> insert into tt1 values(2); >>>>> fetch all from c50; >>>>> COMMIT; >>>>> truncate table tt1; >>>>> >>>>> BEGIN; >>>>> >>>>> declare c50 cursor for select * from tt1; >>>>> insert into tt1 values(1); >>>>> >>>>> insert into tt1 values(2); >>>>> fetch all from c50; >>>>> COMMIT; >>>>> truncate table tt1; >>>>> >>>>> >>>>> BEGIN; >>>>> insert into tt1 values(1); >>>>> insert into tt1 values(2); >>>>> >>>>> declare c50 cursor for select * from tt1; >>>>> insert into tt1 values(3); >>>>> >>>>> fetch all from c50; >>>>> COMMIT; >>>>> truncate table tt1; >>>>> >>>>> >>>>> BEGIN; >>>>> insert into tt1 values(1); >>>>> declare c50 cursor for select * from tt1; >>>>> insert into tt1 values(2); >>>>> declare c51 cursor for select * from tt1; >>>>> insert into tt1 values(3); >>>>> fetch all from c50; >>>>> fetch all from c51; >>>>> COMMIT; >>>>> truncate table tt1; >>>>> >>>>> >>>>> BEGIN; >>>>> insert into tt1 values(1); >>>>> declare c50 cursor for select * from tt1; >>>>> declare c51 cursor for select * from tt1; >>>>> insert into tt1 values(2); >>>>> insert into tt1 values(3); >>>>> fetch all from c50; >>>>> fetch all from c51; >>>>> COMMIT; >>>>> truncate table tt1; >>>>> >>>>> >>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt < >>>>> abb...@en...> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> In a multi-statement transaction each statement is given a command >>>>>> identifier >>>>>> starting from zero and incrementing for each statement. >>>>>> These command indentifers are required for extra tracking because each >>>>>> statement has its own visibility rules with in the transaction. >>>>>> For example, a cursor’s contents must remain unchanged even if later >>>>>> statements in the >>>>>> same transaction modify rows. Such tracking is implemented using >>>>>> system command id >>>>>> columns cmin/cmax, which is internally actually is a single column. >>>>>> >>>>>> cmin/cmax come into play in case of multi-statement transactions >>>>>> only, >>>>>> they are both zero otherwise. >>>>>> >>>>>> cmin "The command identifier of the statement within the inserting >>>>>> transaction." >>>>>> cmax "The command identifier of the statement within the deleting >>>>>> transaction." >>>>>> >>>>>> Here are the visibility rules (taken from comments of tqual.c) >>>>>> >>>>>> ( // A heap tuple is valid >>>>>> "now" iff >>>>>> Xmin == my-transaction && // inserted by the current >>>>>> transaction >>>>>> Cmin < my-command && // before this command, and >>>>>> ( >>>>>> Xmax is null || // the row has not been >>>>>> deleted, or >>>>>> ( >>>>>> Xmax == my-transaction && // it was deleted by the >>>>>> current transaction >>>>>> Cmax >= my-command // but not before this >>>>>> command, >>>>>> ) >>>>>> ) >>>>>> ) >>>>>> || // or >>>>>> ( >>>>>> Xmin is committed && // the row was inserted by a >>>>>> committed transaction, and >>>>>> ( >>>>>> Xmax is null || // the row has not been >>>>>> deleted, or >>>>>> ( >>>>>> Xmax == my-transaction && // the row is being deleted >>>>>> by this transaction >>>>>> Cmax >= my-command) || // but it's not deleted >>>>>> "yet", or >>>>>> ( >>>>>> Xmax != my-transaction && // the row was deleted by >>>>>> another transaction >>>>>> Xmax is not committed // that has not been >>>>>> committed >>>>>> ) >>>>>> ) >>>>>> ) >>>>>> ) >>>>>> >>>>>> Because cmin and cmax are internally a single system column, >>>>>> it is therefore not possible to simply record the status of a row >>>>>> that is created and expired in the same multi-statement transaction. >>>>>> For that reason, a special combo command id is created that >>>>>> references >>>>>> a local memory hash that contains the actual cmin and cmax values. >>>>>> It means that if combo id is being used the number we are seeing >>>>>> would not be the cmin or cmax it will be an index into a local >>>>>> array that contains a structure with has the actual cmin and cmax >>>>>> values. >>>>>> >>>>>> The following queries (taken mostly from >>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>>>>> use the contrib module pageinspect, which allows >>>>>> visibility of internal heap page structures and all stored rows, >>>>>> including those not visible in the current snapshot. >>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>>>>> >>>>>> We are exploring 3 examples here: >>>>>> 1) INSERT & DELETE in a single transaction >>>>>> 2) INSERT & UPDATE in a single transaction >>>>>> 3) INSERT from two different transactions & UPDATE from one >>>>>> >>>>>> test=# drop table mvcc_demo; >>>>>> DROP TABLE >>>>>> test=# >>>>>> test=# create table mvcc_demo (val int); >>>>>> CREATE TABLE >>>>>> test=# >>>>>> test=# TRUNCATE mvcc_demo; >>>>>> TRUNCATE TABLE >>>>>> test=# >>>>>> test=# BEGIN; >>>>>> BEGIN >>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>>>> combo id would be different >>>>>> DELETE 0 >>>>>> test=# DELETE FROM mvcc_demo; >>>>>> DELETE 0 >>>>>> test=# DELETE FROM mvcc_demo; >>>>>> DELETE 0 >>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>> INSERT 0 1 >>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>> INSERT 0 1 >>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>> INSERT 0 1 >>>>>> test=# SELECT t_xmin AS xmin, >>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>> is_combocid >>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>> test-# ORDER BY 2 DESC, 3; >>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>> -------+------+-----------+------------- >>>>>> 80685 | 0 | 3 | f >>>>>> 80685 | 0 | 4 | f >>>>>> 80685 | 0 | 5 | f >>>>>> (3 rows) >>>>>> >>>>>> test=# >>>>>> test=# DELETE FROM mvcc_demo; >>>>>> DELETE 3 >>>>>> test=# SELECT t_xmin AS xmin, >>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>> is_combocid >>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>> test-# ORDER BY 2 DESC, 3; >>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>> -------+-------+-----------+------------- >>>>>> 80685 | 80685 | 0 | t >>>>>> 80685 | 80685 | 1 | t >>>>>> 80685 | 80685 | 2 | t >>>>>> (3 rows) >>>>>> >>>>>> Note that since is_combocid is true the numbers are not cmin/cmax >>>>>> they are actually >>>>>> the indexes of the internal array already explained above. >>>>>> combo id index 0 would contain cmin 3, cmax 6 >>>>>> combo id index 1 would contain cmin 4, cmax 6 >>>>>> combo id index 2 would contain cmin 5, cmax 6 >>>>>> >>>>>> test=# >>>>>> test=# END; >>>>>> COMMIT >>>>>> test=# >>>>>> test=# >>>>>> test=# TRUNCATE mvcc_demo; >>>>>> TRUNCATE TABLE >>>>>> test=# >>>>>> test=# >>>>>> test=# >>>>>> test=# BEGIN; >>>>>> BEGIN >>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>> INSERT 0 1 >>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>> INSERT 0 1 >>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>> INSERT 0 1 >>>>>> test=# SELECT t_xmin AS xmin, >>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>> is_combocid >>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>> test-# ORDER BY 2 DESC, 3; >>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>> -------+------+-----------+------------- >>>>>> 80675 | 0 | 0 | f >>>>>> 80675 | 0 | 1 | f >>>>>> 80675 | 0 | 2 | f >>>>>> (3 rows) >>>>>> >>>>>> test=# >>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>> UPDATE 3 >>>>>> test=# >>>>>> test=# SELECT t_xmin AS xmin, >>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>> is_combocid >>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>> test-# ORDER BY 2 DESC, 3; >>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>> -------+-------+-----------+------------- >>>>>> 80675 | 80675 | 0 | t >>>>>> 80675 | 80675 | 1 | t >>>>>> 80675 | 80675 | 2 | t >>>>>> 80675 | 0 | 3 | f >>>>>> 80675 | 0 | 3 | f >>>>>> 80675 | 0 | 3 | f >>>>>> (6 rows) >>>>>> >>>>>> test=# >>>>>> test=# END; >>>>>> COMMIT >>>>>> test=# >>>>>> test=# >>>>>> test=# TRUNCATE mvcc_demo; >>>>>> TRUNCATE TABLE >>>>>> test=# >>>>>> >>>>>> -- From one psql issue >>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>> INSERT 0 1 >>>>>> test=# SELECT t_xmin AS xmin, >>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>> is_combocid >>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>> test-# ORDER BY 2 DESC, 3; >>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>> -------+------+-----------+------------- >>>>>> 80677 | 0 | 0 | f >>>>>> (1 row) >>>>>> >>>>>> >>>>>> test=# -- From another issue >>>>>> test=# BEGIN; >>>>>> BEGIN >>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>> INSERT 0 1 >>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>> INSERT 0 1 >>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>>>> INSERT 0 1 >>>>>> test=# SELECT t_xmin AS xmin, >>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>> is_combocid >>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>> test-# ORDER BY 2 DESC, 3; >>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>> -------+------+-----------+------------- >>>>>> 80677 | 0 | 0 | f >>>>>> 80678 | 0 | 0 | f >>>>>> 80678 | 0 | 1 | f >>>>>> 80678 | 0 | 2 | f >>>>>> (4 rows) >>>>>> >>>>>> test=# >>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>> UPDATE 4 >>>>>> test=# SELECT t_xmin AS xmin, >>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>> is_combocid >>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>> test-# ORDER BY 2 DESC, 3; >>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>> -------+-------+-----------+------------- >>>>>> 80678 | 80678 | 0 | t >>>>>> 80678 | 80678 | 1 | t >>>>>> 80678 | 80678 | 2 | t >>>>>> 80677 | 80678 | 3 | f >>>>>> 80678 | 0 | 3 | f >>>>>> 80678 | 0 | 3 | f >>>>>> 80678 | 0 | 3 | f >>>>>> 80678 | 0 | 3 | f >>>>>> (8 rows) >>>>>> >>>>>> test=# >>>>>> >>>>>> test=# -- Before finishing this, issue these from the first psql >>>>>> test=# SELECT t_xmin AS xmin, >>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>> is_combocid >>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>> test-# ORDER BY 2 DESC, 3; >>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>> -------+-------+-----------+------------- >>>>>> 80678 | 80678 | 0 | t >>>>>> 80678 | 80678 | 1 | t >>>>>> 80678 | 80678 | 2 | t >>>>>> 80677 | 80678 | 3 | f >>>>>> 80678 | 0 | 3 | f >>>>>> 80678 | 0 | 3 | f >>>>>> 80678 | 0 | 3 | f >>>>>> 80678 | 0 | 3 | f >>>>>> (8 rows) >>>>>> >>>>>> test=# END; >>>>>> COMMIT >>>>>> >>>>>> >>>>>> Now consider the case we are trying to solve >>>>>> >>>>>> drop table tt1; >>>>>> create table tt1(f1 int); >>>>>> >>>>>> BEGIN; >>>>>> insert into tt1 values(1); >>>>>> declare c50 cursor for select * from tt1; -- should show one row only >>>>>> insert into tt1 values(2); >>>>>> fetch all from c50; >>>>>> COMMIT; >>>>>> >>>>>> >>>>>> Consider Data node 1 log >>>>>> >>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>> committed READ WRITE] >>>>>> (b) [exec_simple_query][1026][drop table tt1;] >>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>> committed READ WRITE] >>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>> committed READ WRITE] >>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() >>>>>> FROM tt1] >>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>>>>> >>>>>> The cursor currently shows both inserted rows because command id at >>>>>> data node in >>>>>> step (j) is 0 >>>>>> step (k) is 1 & >>>>>> step (l) is 2 >>>>>> >>>>>> Where as we need command ids to be >>>>>> >>>>>> step (j) should be 0 >>>>>> step (k) should be 2 & >>>>>> step (l) should be 1 >>>>>> >>>>>> This will solve the cursor visibility problem. >>>>>> >>>>>> To implement this I suggest we send command IDs to data nodes from >>>>>> the coordinator >>>>>> like we send gxid. The only difference will be that we do not need to >>>>>> take command IDs >>>>>> from GTM since they are only valid with in the transaction. >>>>>> >>>>>> See this example >>>>>> >>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>> xmin | xmax | cmin | cmax | f1 >>>>>> ------+------+------+------+---- >>>>>> (0 rows) >>>>>> >>>>>> test=# begin; >>>>>> BEGIN >>>>>> test=# insert into tt1 values(1); >>>>>> INSERT 0 1 >>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>> xmin | xmax | cmin | cmax | f1 >>>>>> -------+------+------+------+---- >>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>> (1 row) >>>>>> >>>>>> test=# insert into tt1 values(2); >>>>>> INSERT 0 1 >>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>> xmin | xmax | cmin | cmax | f1 >>>>>> -------+------+------+------+---- >>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>> (2 rows) >>>>>> >>>>>> test=# insert into tt1 values(3); >>>>>> INSERT 0 1 >>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>> xmin | xmax | cmin | cmax | f1 >>>>>> -------+------+------+------+---- >>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>> (3 rows) >>>>>> >>>>>> test=# insert into tt1 values(4); >>>>>> INSERT 0 1 >>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>> xmin | xmax | cmin | cmax | f1 >>>>>> -------+------+------+------+---- >>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>> (4 rows) >>>>>> >>>>>> test=# end; >>>>>> COMMIT >>>>>> test=# >>>>>> test=# >>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>> xmin | xmax | cmin | cmax | f1 >>>>>> -------+------+------+------+---- >>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>> (4 rows) >>>>>> >>>>>> test=# insert into tt1 values(5); >>>>>> INSERT 0 1 >>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>> xmin | xmax | cmin | cmax | f1 >>>>>> -------+------+------+------+---- >>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>> (5 rows) >>>>>> >>>>>> test=# insert into tt1 values(6); >>>>>> INSERT 0 1 >>>>>> test=# >>>>>> test=# >>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>> xmin | xmax | cmin | cmax | f1 >>>>>> -------+------+------+------+---- >>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>> 80617 | 0 | 0 | 0 | 6 >>>>>> (6 rows) >>>>>> >>>>>> Note that at the end of the multi-statement transaction the command >>>>>> id gets reset to zero. >>>>>> >>>>>> -- >>>>>> Abbas >>>>>> Architect >>>>>> EnterpriseDB Corporation >>>>>> The Enterprise PostgreSQL Company >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>>> >>>>> Phone: 92-334-5100153 >>>>> >>>>> Website: www.enterprisedb.com >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>> >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. >>>>> Discussions >>>>> will include endpoint security, mobile security and the latest in >>>>> malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>>> >>>> >>>> >>>> -- >>>> Michael Paquier >>>> http://michael.otacoo.com >>>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2012-07-09 20:04:11
|
Here is the update on the issue. It was decided that the changes done by the data nodes in the command id should be communicated back to the coordinator and that the coordinator should choose the largest of all the received values as the next command id. It was suggested that we should check that a skipped value of command id should not create a problem for the next operations on the table. I have verified both by studying the code and by actually changing the function CommandCounterIncrement to increment the command id by 3 and running regression. It worked fine so a hole in command id is not a problem. Next it was suggested that we should use the mechanism currently in place to send # of tuples affected by a statement to communicate the changed command id to the coordinator. Please refer to this link in the documentation http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html Note that there is no such message format that exists in the current over the wire protocol to communicate the # of tuples affected by a statement. The libpq functions that we might suspect do the same are PQntuples and PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as PQcmdTuples extracts the # of tuples affected from the CommandComplete 'C' message string. We cannot use these mechanisms for our purpose. I evaluated the use of NoticeResponse 'N' for sending changed command id but the message format of NoticeResponse mandates the use of certain fields which will make our sent messages un-necessarily bulky and would consume network bandwidth for no reason. I therefore suggest the we use a new message for communicating XC specific information from data node to coordinator. Currently we will use it for command id but we will design the message format flexible enough to accommodate future XC requirements. Whenever the data node increments command id we will send the information to the coordinator and handle_response function in execRemote.c would be changed to accommodate new message. Since coordinators will never use the new message therefore the existing clients do not need to bother. Comments or suggestions are welcome. Regards On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en...>wrote: > While fixing the regression failures resulting from the changes done by > the patch I was able to fix all except this test case > > set enforce_two_phase_commit = off; > > CREATE TEMP TABLE users ( > id INT PRIMARY KEY, > name VARCHAR NOT NULL > ) DISTRIBUTE BY REPLICATION; > > INSERT INTO users VALUES (1, 'Jozko'); > INSERT INTO users VALUES (2, 'Ferko'); > INSERT INTO users VALUES (3, 'Samko'); > CREATE TEMP TABLE tasks ( > id INT PRIMARY KEY, > owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, > worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, > checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL > ) DISTRIBUTE BY REPLICATION; > > INSERT INTO tasks VALUES (1,1,NULL,NULL); > INSERT INTO tasks VALUES (2,2,2,NULL); > INSERT INTO tasks VALUES (3,3,3,3); > > BEGIN; > > UPDATE tasks set id=id WHERE id=2; > SELECT * FROM tasks; > > DELETE FROM users WHERE id = 2; > SELECT * FROM tasks; > > COMMIT; > > The obtained output from the last select statement is > > id | owner | worker | checked_by > ----+-------+--------+------------ > 1 | 1 | | > 3 | 3 | 3 | 3 > 2 | 2 | 2 | > (3 rows) > > where as the expected output is > > id | owner | worker | checked_by > ----+-------+--------+------------ > 1 | 1 | | > 3 | 3 | 3 | 3 > 2 | | | > (3 rows) > > Note that the owner and worker have been set to null due to "ON DELETE SET > NULL". > > Here is the reason why this does not work properly. Consider the last > transaction > > BEGIN; > > UPDATE tasks set id=id WHERE id=2; > SELECT * FROM tasks; > > DELETE FROM users WHERE id = 2; > SELECT * FROM tasks; > > COMMIT; > > Here are the command id values the coordinator sends to the data node > > 0 for the first update that gets incremented to 1 because this is a DML > and needs to consume a command id > 1 for the first select that remains 1 since it is not required to be > consumed. > 1 for the delete statement that gets incremented to 2 because it is a DML > and 2 for the last select. > > Now this is what happens on the data node > > When the data node receives the first update with command id 0, it > increments it once due to the update itself and once due to the update run > because of "ON UPDATE CASCADE". Hence the command id at the end of update > on data node is 2. > The first select comes to data node with command id 1, which is incorrect. > The user's intention is to see data after update and its command id should > be 2. > Now delete comes with command id 1, and data node increments it once due > to the delete itself and once due to the update run because of "ON DELETE > SET NULL", hence the command id at the end of delete is 3. > Coordinator now sends last select with command id 2, which is again > incorrect since user's intention is to see data after delete and select > should have been sent to data node with command id 3 or 4. > > Every time data node increments command id due to any statements run > implicitly either because of the constraints or triggers, this scheme of > sending command ids to data node from coordinator to solve fetch problems > would fail. > > Datanode can have a trigger e.g. inserting rows thrice on every single > insert and would increment command id on every insert. Therefore this > design cannot work. > > Either we have to synchronize command ids between datanode and coordinator > through GTM > OR > We will have to send the DECLARE CURSOR down to the datanode. In this case > however we will not be able to send the cursor query as it is because the > query might contain a join on two tables which exist on a disjoint set of > data nodes. > > Comments or suggestions are welcome. > > > > On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt <abb...@en...>wrote: > >> Thanks for your comments. >> >> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> Hi Abbas, >>> I have few comments to make >>> 1. With this patch there are two variables for having command Id, that >>> is going to cause confusion and will be a maintenance burden, might be >>> error prone. Is it possible to use a single variable instead of two? >> >> >> Are you talking about receivedCommandId and currentCommandId? If yes, I >> would prefer not having a packet received from coordinator overwrite the >> currentCommandId at data node, because I am not 100% sure about the life >> time of currentCommandId, I might overwrite it before time. It would be >> safe to let currentCommandId as is unless we are compelled to get the next >> command ID, and have the received command id take priority at that time. >> >> >>> Right now there is some code which is specific to cursors in your patch. >>> If you can plug the coordinator command id somehow into currentCommandId, >>> you won't need that code and any other code which needs coordinator command >>> ID will be automatically taken care of. >>> >> >> That code is required to solve a problem. Consider this case when a >> coordinator received this transaction >> >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; >> insert into tt1 values(2); >> fetch all from c50; >> COMMIT; >> >> While sending select to the data node in response to a fetch we need to >> know what was the command ID of the declare cursor statement and we need to >> send that command ID to the data node for this particular fetch. This is >> the main idea behind this solution. >> >> The first insert goes to the data node with command id 0, the second >> insert goes with 2. Command ID 1 is consumed by declare cursor. When >> coordinator sees fetch it needs to send select to the data node with >> command ID 1 rather than 3. >> >> >> >>> 2. A non-transaction on coordinator can spawn tranasactions on datanode >>> or subtransactions (if there is already a transaction running). Does your >>> patch handle that case? >> >> >> No and it does not need to, because that case has no known problems that >> we need to solve. I don't think my patch would impact any such case but I >> will analyze any failures that I may get in regressions. >> >> >>> Should we do more thorough research in the transaction management, esp. >>> to see the impact of getting same command id for two commands on the >>> datanode? >>> >> >> If we issue two commands with the same command ID then we will definitely >> have visibility issues according to the rules I have already explained. But >> we will not have two commands sent to the data node with same command id. >> >> >>> >>> >>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> Hi Ashutosh, >>>> Here are the results with the val column, Thanks. >>>> >>>> test=# drop table mvcc_demo; >>>> DROP TABLE >>>> test=# >>>> test=# create table mvcc_demo (val int); >>>> CREATE TABLE >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> test=# >>>> test=# BEGIN; >>>> BEGIN >>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>> combo id would be different >>>> DELETE 0 >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 0 >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 0 >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80689 | 0 | 3 | f >>>> 80689 | 0 | 4 | f >>>> 80689 | 0 | 5 | f >>>> (3 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80689 | 0 | 3 | 3 | 1 >>>> 80689 | 0 | 4 | 4 | 2 >>>> 80689 | 0 | 5 | 5 | 3 >>>> >>>> (3 rows) >>>> >>>> test=# >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 3 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80689 | 80689 | 0 | t >>>> 80689 | 80689 | 1 | t >>>> 80689 | 80689 | 2 | t >>>> (3 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> ------+------+------+------+----- >>>> (0 rows) >>>> >>>> >>>> test=# >>>> test=# END; >>>> COMMIT >>>> test=# >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> test=# BEGIN; >>>> BEGIN >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80693 | 0 | 0 | f >>>> 80693 | 0 | 1 | f >>>> 80693 | 0 | 2 | f >>>> (3 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80693 | 0 | 0 | 0 | 1 >>>> 80693 | 0 | 1 | 1 | 2 >>>> 80693 | 0 | 2 | 2 | 3 >>>> (3 rows) >>>> >>>> test=# >>>> test=# UPDATE mvcc_demo SET val = 10; >>>> >>>> UPDATE 3 >>>> test=# >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80693 | 80693 | 0 | t >>>> 80693 | 80693 | 1 | t >>>> 80693 | 80693 | 2 | t >>>> 80693 | 0 | 3 | f >>>> 80693 | 0 | 3 | f >>>> 80693 | 0 | 3 | f >>>> (6 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80693 | 0 | 3 | 3 | 10 >>>> 80693 | 0 | 3 | 3 | 10 >>>> 80693 | 0 | 3 | 3 | 10 >>>> (3 rows) >>>> >>>> >>>> test=# >>>> test=# END; >>>> COMMIT >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- From one psql issue >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80699 | 0 | 0 | f >>>> (1 row) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80699 | 0 | 0 | 0 | 1 >>>> (1 row) >>>> >>>> >>>> >>>> >>>> >>>> test=# -- From another issue >>>> test=# BEGIN; >>>> BEGIN >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80699 | 0 | 0 | f >>>> 80700 | 0 | 0 | f >>>> 80700 | 0 | 1 | f >>>> 80700 | 0 | 2 | f >>>> (4 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80699 | 0 | 0 | 0 | 1 >>>> 80700 | 0 | 0 | 0 | 2 >>>> 80700 | 0 | 1 | 1 | 3 >>>> 80700 | 0 | 2 | 2 | 4 >>>> (4 rows) >>>> >>>> test=# >>>> test=# UPDATE mvcc_demo SET val = 10; >>>> >>>> UPDATE 4 >>>> test=# >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80700 | 80700 | 0 | t >>>> 80700 | 80700 | 1 | t >>>> 80700 | 80700 | 2 | t >>>> 80699 | 80700 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> (8 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80700 | 0 | 3 | 3 | 10 >>>> 80700 | 0 | 3 | 3 | 10 >>>> 80700 | 0 | 3 | 3 | 10 >>>> 80700 | 0 | 3 | 3 | 10 >>>> (4 rows) >>>> >>>> >>>> >>>> >>>> test=# -- Before finishing this, issue these from the first psql >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80700 | 80700 | 0 | t >>>> 80700 | 80700 | 1 | t >>>> 80700 | 80700 | 2 | t >>>> 80699 | 80700 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> (8 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+-------+------+------+----- >>>> 80699 | 80700 | 3 | 3 | 1 >>>> (1 row) >>>> >>>> test=# end; >>>> COMMIT >>>> >>>> >>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier < >>>> mic...@gm...> wrote: >>>> >>>>> Hi, >>>>> >>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, so >>>>> be sure to send it to remote Datanodes really only if necessary. >>>>> You should put more severe conditions blocking this function cid can >>>>> easily get incremented in Postgres. >>>>> >>>>> Regards, >>>>> >>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt < >>>>> abb...@en...> wrote: >>>>> >>>>>> PFA a WIP patch implementing the design presented earlier. >>>>>> The patch is WIP because it still has and FIXME and it shows some >>>>>> regression failures that need to be fixed, but other than that it confirms >>>>>> that the suggested design would work fine. The following test cases now >>>>>> work fine >>>>>> >>>>>> drop table tt1; >>>>>> create table tt1(f1 int) distribute by replication; >>>>>> >>>>>> >>>>>> BEGIN; >>>>>> insert into tt1 values(1); >>>>>> declare c50 cursor for select * from tt1; >>>>>> insert into tt1 values(2); >>>>>> fetch all from c50; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> BEGIN; >>>>>> >>>>>> declare c50 cursor for select * from tt1; >>>>>> insert into tt1 values(1); >>>>>> >>>>>> insert into tt1 values(2); >>>>>> fetch all from c50; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> >>>>>> BEGIN; >>>>>> insert into tt1 values(1); >>>>>> insert into tt1 values(2); >>>>>> >>>>>> declare c50 cursor for select * from tt1; >>>>>> insert into tt1 values(3); >>>>>> >>>>>> fetch all from c50; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> >>>>>> BEGIN; >>>>>> insert into tt1 values(1); >>>>>> declare c50 cursor for select * from tt1; >>>>>> insert into tt1 values(2); >>>>>> declare c51 cursor for select * from tt1; >>>>>> insert into tt1 values(3); >>>>>> fetch all from c50; >>>>>> fetch all from c51; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> >>>>>> BEGIN; >>>>>> insert into tt1 values(1); >>>>>> declare c50 cursor for select * from tt1; >>>>>> declare c51 cursor for select * from tt1; >>>>>> insert into tt1 values(2); >>>>>> insert into tt1 values(3); >>>>>> fetch all from c50; >>>>>> fetch all from c51; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> >>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt < >>>>>> abb...@en...> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> In a multi-statement transaction each statement is given a command >>>>>>> identifier >>>>>>> starting from zero and incrementing for each statement. >>>>>>> These command indentifers are required for extra tracking because >>>>>>> each >>>>>>> statement has its own visibility rules with in the transaction. >>>>>>> For example, a cursor’s contents must remain unchanged even if later >>>>>>> statements in the >>>>>>> same transaction modify rows. Such tracking is implemented using >>>>>>> system command id >>>>>>> columns cmin/cmax, which is internally actually is a single column. >>>>>>> >>>>>>> cmin/cmax come into play in case of multi-statement transactions >>>>>>> only, >>>>>>> they are both zero otherwise. >>>>>>> >>>>>>> cmin "The command identifier of the statement within the inserting >>>>>>> transaction." >>>>>>> cmax "The command identifier of the statement within the deleting >>>>>>> transaction." >>>>>>> >>>>>>> Here are the visibility rules (taken from comments of tqual.c) >>>>>>> >>>>>>> ( // A heap tuple is valid >>>>>>> "now" iff >>>>>>> Xmin == my-transaction && // inserted by the current >>>>>>> transaction >>>>>>> Cmin < my-command && // before this command, and >>>>>>> ( >>>>>>> Xmax is null || // the row has not been >>>>>>> deleted, or >>>>>>> ( >>>>>>> Xmax == my-transaction && // it was deleted by the >>>>>>> current transaction >>>>>>> Cmax >= my-command // but not before this >>>>>>> command, >>>>>>> ) >>>>>>> ) >>>>>>> ) >>>>>>> || // or >>>>>>> ( >>>>>>> Xmin is committed && // the row was inserted by >>>>>>> a committed transaction, and >>>>>>> ( >>>>>>> Xmax is null || // the row has not been >>>>>>> deleted, or >>>>>>> ( >>>>>>> Xmax == my-transaction && // the row is being deleted >>>>>>> by this transaction >>>>>>> Cmax >= my-command) || // but it's not deleted >>>>>>> "yet", or >>>>>>> ( >>>>>>> Xmax != my-transaction && // the row was deleted by >>>>>>> another transaction >>>>>>> Xmax is not committed // that has not been >>>>>>> committed >>>>>>> ) >>>>>>> ) >>>>>>> ) >>>>>>> ) >>>>>>> >>>>>>> Because cmin and cmax are internally a single system column, >>>>>>> it is therefore not possible to simply record the status of a row >>>>>>> that is created and expired in the same multi-statement transaction. >>>>>>> For that reason, a special combo command id is created that >>>>>>> references >>>>>>> a local memory hash that contains the actual cmin and cmax values. >>>>>>> It means that if combo id is being used the number we are seeing >>>>>>> would not be the cmin or cmax it will be an index into a local >>>>>>> array that contains a structure with has the actual cmin and cmax >>>>>>> values. >>>>>>> >>>>>>> The following queries (taken mostly from >>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>>>>>> use the contrib module pageinspect, which allows >>>>>>> visibility of internal heap page structures and all stored rows, >>>>>>> including those not visible in the current snapshot. >>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>>>>>> >>>>>>> We are exploring 3 examples here: >>>>>>> 1) INSERT & DELETE in a single transaction >>>>>>> 2) INSERT & UPDATE in a single transaction >>>>>>> 3) INSERT from two different transactions & UPDATE from one >>>>>>> >>>>>>> test=# drop table mvcc_demo; >>>>>>> DROP TABLE >>>>>>> test=# >>>>>>> test=# create table mvcc_demo (val int); >>>>>>> CREATE TABLE >>>>>>> test=# >>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>> TRUNCATE TABLE >>>>>>> test=# >>>>>>> test=# BEGIN; >>>>>>> BEGIN >>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>>>>> combo id would be different >>>>>>> DELETE 0 >>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>> DELETE 0 >>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>> DELETE 0 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>> INSERT 0 1 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+------+-----------+------------- >>>>>>> 80685 | 0 | 3 | f >>>>>>> 80685 | 0 | 4 | f >>>>>>> 80685 | 0 | 5 | f >>>>>>> (3 rows) >>>>>>> >>>>>>> test=# >>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>> DELETE 3 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+-------+-----------+------------- >>>>>>> 80685 | 80685 | 0 | t >>>>>>> 80685 | 80685 | 1 | t >>>>>>> 80685 | 80685 | 2 | t >>>>>>> (3 rows) >>>>>>> >>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax >>>>>>> they are actually >>>>>>> the indexes of the internal array already explained above. >>>>>>> combo id index 0 would contain cmin 3, cmax 6 >>>>>>> combo id index 1 would contain cmin 4, cmax 6 >>>>>>> combo id index 2 would contain cmin 5, cmax 6 >>>>>>> >>>>>>> test=# >>>>>>> test=# END; >>>>>>> COMMIT >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>> TRUNCATE TABLE >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# BEGIN; >>>>>>> BEGIN >>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>> INSERT 0 1 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+------+-----------+------------- >>>>>>> 80675 | 0 | 0 | f >>>>>>> 80675 | 0 | 1 | f >>>>>>> 80675 | 0 | 2 | f >>>>>>> (3 rows) >>>>>>> >>>>>>> test=# >>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>>> UPDATE 3 >>>>>>> test=# >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+-------+-----------+------------- >>>>>>> 80675 | 80675 | 0 | t >>>>>>> 80675 | 80675 | 1 | t >>>>>>> 80675 | 80675 | 2 | t >>>>>>> 80675 | 0 | 3 | f >>>>>>> 80675 | 0 | 3 | f >>>>>>> 80675 | 0 | 3 | f >>>>>>> (6 rows) >>>>>>> >>>>>>> test=# >>>>>>> test=# END; >>>>>>> COMMIT >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>> TRUNCATE TABLE >>>>>>> test=# >>>>>>> >>>>>>> -- From one psql issue >>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>> INSERT 0 1 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+------+-----------+------------- >>>>>>> 80677 | 0 | 0 | f >>>>>>> (1 row) >>>>>>> >>>>>>> >>>>>>> test=# -- From another issue >>>>>>> test=# BEGIN; >>>>>>> BEGIN >>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>>>>> INSERT 0 1 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+------+-----------+------------- >>>>>>> 80677 | 0 | 0 | f >>>>>>> 80678 | 0 | 0 | f >>>>>>> 80678 | 0 | 1 | f >>>>>>> 80678 | 0 | 2 | f >>>>>>> (4 rows) >>>>>>> >>>>>>> test=# >>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>>> UPDATE 4 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+-------+-----------+------------- >>>>>>> 80678 | 80678 | 0 | t >>>>>>> 80678 | 80678 | 1 | t >>>>>>> 80678 | 80678 | 2 | t >>>>>>> 80677 | 80678 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> (8 rows) >>>>>>> >>>>>>> test=# >>>>>>> >>>>>>> test=# -- Before finishing this, issue these from the first psql >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+-------+-----------+------------- >>>>>>> 80678 | 80678 | 0 | t >>>>>>> 80678 | 80678 | 1 | t >>>>>>> 80678 | 80678 | 2 | t >>>>>>> 80677 | 80678 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> (8 rows) >>>>>>> >>>>>>> test=# END; >>>>>>> COMMIT >>>>>>> >>>>>>> >>>>>>> Now consider the case we are trying to solve >>>>>>> >>>>>>> drop table tt1; >>>>>>> create table tt1(f1 int); >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> declare c50 cursor for select * from tt1; -- should show one row >>>>>>> only >>>>>>> insert into tt1 values(2); >>>>>>> fetch all from c50; >>>>>>> COMMIT; >>>>>>> >>>>>>> >>>>>>> Consider Data node 1 log >>>>>>> >>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>> committed READ WRITE] >>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>> committed READ WRITE] >>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>> committed READ WRITE] >>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() >>>>>>> FROM tt1] >>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>>>>>> >>>>>>> The cursor currently shows both inserted rows because command id at >>>>>>> data node in >>>>>>> step (j) is 0 >>>>>>> step (k) is 1 & >>>>>>> step (l) is 2 >>>>>>> >>>>>>> Where as we need command ids to be >>>>>>> >>>>>>> step (j) should be 0 >>>>>>> step (k) should be 2 & >>>>>>> step (l) should be 1 >>>>>>> >>>>>>> This will solve the cursor visibility problem. >>>>>>> >>>>>>> To implement this I suggest we send command IDs to data nodes from >>>>>>> the coordinator >>>>>>> like we send gxid. The only difference will be that we do not need >>>>>>> to take command IDs >>>>>>> from GTM since they are only valid with in the transaction. >>>>>>> >>>>>>> See this example >>>>>>> >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> ------+------+------+------+---- >>>>>>> (0 rows) >>>>>>> >>>>>>> test=# begin; >>>>>>> BEGIN >>>>>>> test=# insert into tt1 values(1); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> (1 row) >>>>>>> >>>>>>> test=# insert into tt1 values(2); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> (2 rows) >>>>>>> >>>>>>> test=# insert into tt1 values(3); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> (3 rows) >>>>>>> >>>>>>> test=# insert into tt1 values(4); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>> (4 rows) >>>>>>> >>>>>>> test=# end; >>>>>>> COMMIT >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>> (4 rows) >>>>>>> >>>>>>> test=# insert into tt1 values(5); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>>> (5 rows) >>>>>>> >>>>>>> test=# insert into tt1 values(6); >>>>>>> INSERT 0 1 >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>>> 80617 | 0 | 0 | 0 | 6 >>>>>>> (6 rows) >>>>>>> >>>>>>> Note that at the end of the multi-statement transaction the command >>>>>>> id gets reset to zero. >>>>>>> >>>>>>> -- >>>>>>> Abbas >>>>>>> Architect >>>>>>> EnterpriseDB Corporation >>>>>>> The Enterprise PostgreSQL Company >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Abbas >>>>>> Architect >>>>>> EnterpriseDB Corporation >>>>>> The Enterprise PostgreSQL Company >>>>>> >>>>>> Phone: 92-334-5100153 >>>>>> >>>>>> Website: www.enterprisedb.com >>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>> >>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>> the individual or entity to whom it is addressed. This message >>>>>> contains information from EnterpriseDB Corporation that may be >>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>> law. If you are not the intended recipient or authorized to receive >>>>>> this for the intended recipient, any use, dissemination, distribution, >>>>>> retention, archiving, or copying of this communication is strictly >>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>> the sender immediately by reply e-mail and delete this message. >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Live Security Virtual Conference >>>>>> Exclusive live event will cover all the ways today's security and >>>>>> threat landscape has changed and how IT managers can respond. >>>>>> Discussions >>>>>> will include endpoint security, mobile security and the latest in >>>>>> malware >>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> _______________________________________________ >>>>>> Postgres-xc-developers mailing list >>>>>> Pos...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Michael Paquier >>>>> http://michael.otacoo.com >>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Koichi S. <koi...@gm...> - 2012-07-10 01:54:16
|
I basically agree on the idea. I also think Michael needs this kind of extension to feedback trigger target rows. We should design NoticeResponse extensible enough for various usage. Regards; ---------- Koichi Suzuki 2012/7/10 Abbas Butt <abb...@en...>: > Here is the update on the issue. > > It was decided that the changes done by the data nodes in the command id > should be communicated back to the coordinator and that the coordinator > should choose the largest of all the received values as the next command id. > > It was suggested that we should check that a skipped value of command id > should not create a problem for the next operations on the table. I have > verified both by studying the code and by actually changing the function > CommandCounterIncrement to increment the command id by 3 and running > regression. It worked fine so a hole in command id is not a problem. > > Next it was suggested that we should use the mechanism currently in place to > send # of tuples affected by a statement to communicate the changed command > id to the coordinator. > Please refer to this link in the documentation > http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html > Note that there is no such message format that exists in the current over > the wire protocol to communicate the # of tuples affected by a statement. > The libpq functions that we might suspect do the same are PQntuples and > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as > PQcmdTuples extracts the # of tuples affected from the CommandComplete 'C' > message string. We cannot use these mechanisms for our purpose. > > I evaluated the use of NoticeResponse 'N' for sending changed command id but > the message format of NoticeResponse mandates the use of certain fields > which will make our sent messages un-necessarily bulky and would consume > network bandwidth for no reason. > > I therefore suggest the we use a new message for communicating XC specific > information from data node to coordinator. Currently we will use it for > command id but we will design the message format flexible enough to > accommodate future XC requirements. Whenever the data node increments > command id we will send the information to the coordinator and > handle_response function in execRemote.c would be changed to accommodate new > message. Since coordinators will never use the new message therefore the > existing clients do not need to bother. > > Comments or suggestions are welcome. > > Regards > > > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en...> > wrote: >> >> While fixing the regression failures resulting from the changes done by >> the patch I was able to fix all except this test case >> >> set enforce_two_phase_commit = off; >> >> CREATE TEMP TABLE users ( >> id INT PRIMARY KEY, >> name VARCHAR NOT NULL >> ) DISTRIBUTE BY REPLICATION; >> >> INSERT INTO users VALUES (1, 'Jozko'); >> INSERT INTO users VALUES (2, 'Ferko'); >> INSERT INTO users VALUES (3, 'Samko'); >> CREATE TEMP TABLE tasks ( >> id INT PRIMARY KEY, >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL >> ) DISTRIBUTE BY REPLICATION; >> >> INSERT INTO tasks VALUES (1,1,NULL,NULL); >> INSERT INTO tasks VALUES (2,2,2,NULL); >> INSERT INTO tasks VALUES (3,3,3,3); >> >> BEGIN; >> >> UPDATE tasks set id=id WHERE id=2; >> SELECT * FROM tasks; >> >> DELETE FROM users WHERE id = 2; >> SELECT * FROM tasks; >> >> COMMIT; >> >> The obtained output from the last select statement is >> >> id | owner | worker | checked_by >> ----+-------+--------+------------ >> 1 | 1 | | >> 3 | 3 | 3 | 3 >> 2 | 2 | 2 | >> (3 rows) >> >> where as the expected output is >> >> id | owner | worker | checked_by >> ----+-------+--------+------------ >> 1 | 1 | | >> 3 | 3 | 3 | 3 >> 2 | | | >> (3 rows) >> >> Note that the owner and worker have been set to null due to "ON DELETE SET >> NULL". >> >> Here is the reason why this does not work properly. Consider the last >> transaction >> >> BEGIN; >> >> UPDATE tasks set id=id WHERE id=2; >> SELECT * FROM tasks; >> >> DELETE FROM users WHERE id = 2; >> SELECT * FROM tasks; >> >> COMMIT; >> >> Here are the command id values the coordinator sends to the data node >> >> 0 for the first update that gets incremented to 1 because this is a DML >> and needs to consume a command id >> 1 for the first select that remains 1 since it is not required to be >> consumed. >> 1 for the delete statement that gets incremented to 2 because it is a DML >> and 2 for the last select. >> >> Now this is what happens on the data node >> >> When the data node receives the first update with command id 0, it >> increments it once due to the update itself and once due to the update run >> because of "ON UPDATE CASCADE". Hence the command id at the end of update on >> data node is 2. >> The first select comes to data node with command id 1, which is incorrect. >> The user's intention is to see data after update and its command id should >> be 2. >> Now delete comes with command id 1, and data node increments it once due >> to the delete itself and once due to the update run because of "ON DELETE >> SET NULL", hence the command id at the end of delete is 3. >> Coordinator now sends last select with command id 2, which is again >> incorrect since user's intention is to see data after delete and select >> should have been sent to data node with command id 3 or 4. >> >> Every time data node increments command id due to any statements run >> implicitly either because of the constraints or triggers, this scheme of >> sending command ids to data node from coordinator to solve fetch problems >> would fail. >> >> Datanode can have a trigger e.g. inserting rows thrice on every single >> insert and would increment command id on every insert. Therefore this design >> cannot work. >> >> Either we have to synchronize command ids between datanode and coordinator >> through GTM >> OR >> We will have to send the DECLARE CURSOR down to the datanode. In this case >> however we will not be able to send the cursor query as it is because the >> query might contain a join on two tables which exist on a disjoint set of >> data nodes. >> >> Comments or suggestions are welcome. >> >> >> >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt <abb...@en...> >> wrote: >>> >>> Thanks for your comments. >>> >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat >>> <ash...@en...> wrote: >>>> >>>> Hi Abbas, >>>> I have few comments to make >>>> 1. With this patch there are two variables for having command Id, that >>>> is going to cause confusion and will be a maintenance burden, might be error >>>> prone. Is it possible to use a single variable instead of two? >>> >>> >>> Are you talking about receivedCommandId and currentCommandId? If yes, I >>> would prefer not having a packet received from coordinator overwrite the >>> currentCommandId at data node, because I am not 100% sure about the life >>> time of currentCommandId, I might overwrite it before time. It would be safe >>> to let currentCommandId as is unless we are compelled to get the next >>> command ID, and have the received command id take priority at that time. >>> >>>> >>>> Right now there is some code which is specific to cursors in your patch. >>>> If you can plug the coordinator command id somehow into currentCommandId, >>>> you won't need that code and any other code which needs coordinator command >>>> ID will be automatically taken care of. >>> >>> >>> That code is required to solve a problem. Consider this case when a >>> coordinator received this transaction >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(2); >>> fetch all from c50; >>> COMMIT; >>> >>> While sending select to the data node in response to a fetch we need to >>> know what was the command ID of the declare cursor statement and we need to >>> send that command ID to the data node for this particular fetch. This is the >>> main idea behind this solution. >>> >>> The first insert goes to the data node with command id 0, the second >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When >>> coordinator sees fetch it needs to send select to the data node with command >>> ID 1 rather than 3. >>> >>> >>>> >>>> 2. A non-transaction on coordinator can spawn tranasactions on datanode >>>> or subtransactions (if there is already a transaction running). Does your >>>> patch handle that case? >>> >>> >>> No and it does not need to, because that case has no known problems that >>> we need to solve. I don't think my patch would impact any such case but I >>> will analyze any failures that I may get in regressions. >>> >>>> >>>> Should we do more thorough research in the transaction management, esp. >>>> to see the impact of getting same command id for two commands on the >>>> datanode? >>> >>> >>> If we issue two commands with the same command ID then we will definitely >>> have visibility issues according to the rules I have already explained. But >>> we will not have two commands sent to the data node with same command id. >>> >>>> >>>> >>>> >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt >>>> <abb...@en...> wrote: >>>>> >>>>> Hi Ashutosh, >>>>> Here are the results with the val column, Thanks. >>>>> >>>>> test=# drop table mvcc_demo; >>>>> DROP TABLE >>>>> test=# >>>>> test=# create table mvcc_demo (val int); >>>>> CREATE TABLE >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> test=# >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>>> combo id would be different >>>>> DELETE 0 >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 0 >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 0 >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80689 | 0 | 3 | f >>>>> 80689 | 0 | 4 | f >>>>> 80689 | 0 | 5 | f >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80689 | 0 | 3 | 3 | 1 >>>>> 80689 | 0 | 4 | 4 | 2 >>>>> 80689 | 0 | 5 | 5 | 3 >>>>> >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 3 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80689 | 80689 | 0 | t >>>>> 80689 | 80689 | 1 | t >>>>> 80689 | 80689 | 2 | t >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> ------+------+------+------+----- >>>>> (0 rows) >>>>> >>>>> >>>>> test=# >>>>> test=# END; >>>>> COMMIT >>>>> test=# >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80693 | 0 | 0 | f >>>>> 80693 | 0 | 1 | f >>>>> 80693 | 0 | 2 | f >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80693 | 0 | 0 | 0 | 1 >>>>> 80693 | 0 | 1 | 1 | 2 >>>>> 80693 | 0 | 2 | 2 | 3 >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# UPDATE mvcc_demo SET val = 10; >>>>> >>>>> UPDATE 3 >>>>> test=# >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80693 | 80693 | 0 | t >>>>> 80693 | 80693 | 1 | t >>>>> 80693 | 80693 | 2 | t >>>>> 80693 | 0 | 3 | f >>>>> 80693 | 0 | 3 | f >>>>> 80693 | 0 | 3 | f >>>>> (6 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80693 | 0 | 3 | 3 | 10 >>>>> 80693 | 0 | 3 | 3 | 10 >>>>> 80693 | 0 | 3 | 3 | 10 >>>>> (3 rows) >>>>> >>>>> >>>>> test=# >>>>> test=# END; >>>>> COMMIT >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- From one psql issue >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80699 | 0 | 0 | f >>>>> (1 row) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80699 | 0 | 0 | 0 | 1 >>>>> (1 row) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> test=# -- From another issue >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80699 | 0 | 0 | f >>>>> 80700 | 0 | 0 | f >>>>> 80700 | 0 | 1 | f >>>>> 80700 | 0 | 2 | f >>>>> (4 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80699 | 0 | 0 | 0 | 1 >>>>> 80700 | 0 | 0 | 0 | 2 >>>>> 80700 | 0 | 1 | 1 | 3 >>>>> 80700 | 0 | 2 | 2 | 4 >>>>> (4 rows) >>>>> >>>>> test=# >>>>> test=# UPDATE mvcc_demo SET val = 10; >>>>> >>>>> UPDATE 4 >>>>> test=# >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80700 | 80700 | 0 | t >>>>> 80700 | 80700 | 1 | t >>>>> 80700 | 80700 | 2 | t >>>>> 80699 | 80700 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> (8 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80700 | 0 | 3 | 3 | 10 >>>>> 80700 | 0 | 3 | 3 | 10 >>>>> 80700 | 0 | 3 | 3 | 10 >>>>> 80700 | 0 | 3 | 3 | 10 >>>>> (4 rows) >>>>> >>>>> >>>>> >>>>> >>>>> test=# -- Before finishing this, issue these from the first psql >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80700 | 80700 | 0 | t >>>>> 80700 | 80700 | 1 | t >>>>> 80700 | 80700 | 2 | t >>>>> 80699 | 80700 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> (8 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+-------+------+------+----- >>>>> 80699 | 80700 | 3 | 3 | 1 >>>>> (1 row) >>>>> >>>>> test=# end; >>>>> COMMIT >>>>> >>>>> >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier >>>>> <mic...@gm...> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, so >>>>>> be sure to send it to remote Datanodes really only if necessary. >>>>>> You should put more severe conditions blocking this function cid can >>>>>> easily get incremented in Postgres. >>>>>> >>>>>> Regards, >>>>>> >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt >>>>>> <abb...@en...> wrote: >>>>>>> >>>>>>> PFA a WIP patch implementing the design presented earlier. >>>>>>> The patch is WIP because it still has and FIXME and it shows some >>>>>>> regression failures that need to be fixed, but other than that it confirms >>>>>>> that the suggested design would work fine. The following test cases now work >>>>>>> fine >>>>>>> >>>>>>> drop table tt1; >>>>>>> create table tt1(f1 int) distribute by replication; >>>>>>> >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> insert into tt1 values(2); >>>>>>> fetch all from c50; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> BEGIN; >>>>>>> >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> insert into tt1 values(1); >>>>>>> >>>>>>> insert into tt1 values(2); >>>>>>> fetch all from c50; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> insert into tt1 values(2); >>>>>>> >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> insert into tt1 values(3); >>>>>>> >>>>>>> fetch all from c50; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> insert into tt1 values(2); >>>>>>> declare c51 cursor for select * from tt1; >>>>>>> insert into tt1 values(3); >>>>>>> fetch all from c50; >>>>>>> fetch all from c51; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> declare c51 cursor for select * from tt1; >>>>>>> insert into tt1 values(2); >>>>>>> insert into tt1 values(3); >>>>>>> fetch all from c50; >>>>>>> fetch all from c51; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt >>>>>>> <abb...@en...> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> In a multi-statement transaction each statement is given a command >>>>>>>> identifier >>>>>>>> starting from zero and incrementing for each statement. >>>>>>>> These command indentifers are required for extra tracking because >>>>>>>> each >>>>>>>> statement has its own visibility rules with in the transaction. >>>>>>>> For example, a cursor’s contents must remain unchanged even if later >>>>>>>> statements in the >>>>>>>> same transaction modify rows. Such tracking is implemented using >>>>>>>> system command id >>>>>>>> columns cmin/cmax, which is internally actually is a single column. >>>>>>>> >>>>>>>> cmin/cmax come into play in case of multi-statement transactions >>>>>>>> only, >>>>>>>> they are both zero otherwise. >>>>>>>> >>>>>>>> cmin "The command identifier of the statement within the inserting >>>>>>>> transaction." >>>>>>>> cmax "The command identifier of the statement within the deleting >>>>>>>> transaction." >>>>>>>> >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) >>>>>>>> >>>>>>>> ( // A heap tuple is valid >>>>>>>> "now" iff >>>>>>>> Xmin == my-transaction && // inserted by the current >>>>>>>> transaction >>>>>>>> Cmin < my-command && // before this command, and >>>>>>>> ( >>>>>>>> Xmax is null || // the row has not been >>>>>>>> deleted, or >>>>>>>> ( >>>>>>>> Xmax == my-transaction && // it was deleted by the >>>>>>>> current transaction >>>>>>>> Cmax >= my-command // but not before this >>>>>>>> command, >>>>>>>> ) >>>>>>>> ) >>>>>>>> ) >>>>>>>> || // or >>>>>>>> ( >>>>>>>> Xmin is committed && // the row was inserted by >>>>>>>> a committed transaction, and >>>>>>>> ( >>>>>>>> Xmax is null || // the row has not been >>>>>>>> deleted, or >>>>>>>> ( >>>>>>>> Xmax == my-transaction && // the row is being deleted >>>>>>>> by this transaction >>>>>>>> Cmax >= my-command) || // but it's not deleted >>>>>>>> "yet", or >>>>>>>> ( >>>>>>>> Xmax != my-transaction && // the row was deleted by >>>>>>>> another transaction >>>>>>>> Xmax is not committed // that has not been >>>>>>>> committed >>>>>>>> ) >>>>>>>> ) >>>>>>>> ) >>>>>>>> ) >>>>>>>> >>>>>>>> Because cmin and cmax are internally a single system column, >>>>>>>> it is therefore not possible to simply record the status of a row >>>>>>>> that is created and expired in the same multi-statement transaction. >>>>>>>> For that reason, a special combo command id is created that >>>>>>>> references >>>>>>>> a local memory hash that contains the actual cmin and cmax values. >>>>>>>> It means that if combo id is being used the number we are seeing >>>>>>>> would not be the cmin or cmax it will be an index into a local >>>>>>>> array that contains a structure with has the actual cmin and cmax >>>>>>>> values. >>>>>>>> >>>>>>>> The following queries (taken mostly from >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>>>>>>> use the contrib module pageinspect, which allows >>>>>>>> visibility of internal heap page structures and all stored rows, >>>>>>>> including those not visible in the current snapshot. >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>>>>>>> >>>>>>>> We are exploring 3 examples here: >>>>>>>> 1) INSERT & DELETE in a single transaction >>>>>>>> 2) INSERT & UPDATE in a single transaction >>>>>>>> 3) INSERT from two different transactions & UPDATE from one >>>>>>>> >>>>>>>> test=# drop table mvcc_demo; >>>>>>>> DROP TABLE >>>>>>>> test=# >>>>>>>> test=# create table mvcc_demo (val int); >>>>>>>> CREATE TABLE >>>>>>>> test=# >>>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>>> TRUNCATE TABLE >>>>>>>> test=# >>>>>>>> test=# BEGIN; >>>>>>>> BEGIN >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>>>>>> combo id would be different >>>>>>>> DELETE 0 >>>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>>> DELETE 0 >>>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>>> DELETE 0 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>>> INSERT 0 1 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+------+-----------+------------- >>>>>>>> 80685 | 0 | 3 | f >>>>>>>> 80685 | 0 | 4 | f >>>>>>>> 80685 | 0 | 5 | f >>>>>>>> (3 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>>> DELETE 3 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+-------+-----------+------------- >>>>>>>> 80685 | 80685 | 0 | t >>>>>>>> 80685 | 80685 | 1 | t >>>>>>>> 80685 | 80685 | 2 | t >>>>>>>> (3 rows) >>>>>>>> >>>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax >>>>>>>> they are actually >>>>>>>> the indexes of the internal array already explained above. >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# END; >>>>>>>> COMMIT >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>>> TRUNCATE TABLE >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# BEGIN; >>>>>>>> BEGIN >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>>> INSERT 0 1 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+------+-----------+------------- >>>>>>>> 80675 | 0 | 0 | f >>>>>>>> 80675 | 0 | 1 | f >>>>>>>> 80675 | 0 | 2 | f >>>>>>>> (3 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>>>> UPDATE 3 >>>>>>>> test=# >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+-------+-----------+------------- >>>>>>>> 80675 | 80675 | 0 | t >>>>>>>> 80675 | 80675 | 1 | t >>>>>>>> 80675 | 80675 | 2 | t >>>>>>>> 80675 | 0 | 3 | f >>>>>>>> 80675 | 0 | 3 | f >>>>>>>> 80675 | 0 | 3 | f >>>>>>>> (6 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# END; >>>>>>>> COMMIT >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>>> TRUNCATE TABLE >>>>>>>> test=# >>>>>>>> >>>>>>>> -- From one psql issue >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>>> INSERT 0 1 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+------+-----------+------------- >>>>>>>> 80677 | 0 | 0 | f >>>>>>>> (1 row) >>>>>>>> >>>>>>>> >>>>>>>> test=# -- From another issue >>>>>>>> test=# BEGIN; >>>>>>>> BEGIN >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>>>>>> INSERT 0 1 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+------+-----------+------------- >>>>>>>> 80677 | 0 | 0 | f >>>>>>>> 80678 | 0 | 0 | f >>>>>>>> 80678 | 0 | 1 | f >>>>>>>> 80678 | 0 | 2 | f >>>>>>>> (4 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>>>> UPDATE 4 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+-------+-----------+------------- >>>>>>>> 80678 | 80678 | 0 | t >>>>>>>> 80678 | 80678 | 1 | t >>>>>>>> 80678 | 80678 | 2 | t >>>>>>>> 80677 | 80678 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> (8 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> >>>>>>>> test=# -- Before finishing this, issue these from the first psql >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+-------+-----------+------------- >>>>>>>> 80678 | 80678 | 0 | t >>>>>>>> 80678 | 80678 | 1 | t >>>>>>>> 80678 | 80678 | 2 | t >>>>>>>> 80677 | 80678 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> (8 rows) >>>>>>>> >>>>>>>> test=# END; >>>>>>>> COMMIT >>>>>>>> >>>>>>>> >>>>>>>> Now consider the case we are trying to solve >>>>>>>> >>>>>>>> drop table tt1; >>>>>>>> create table tt1(f1 int); >>>>>>>> >>>>>>>> BEGIN; >>>>>>>> insert into tt1 values(1); >>>>>>>> declare c50 cursor for select * from tt1; -- should show one row >>>>>>>> only >>>>>>>> insert into tt1 values(2); >>>>>>>> fetch all from c50; >>>>>>>> COMMIT; >>>>>>>> >>>>>>>> >>>>>>>> Consider Data node 1 log >>>>>>>> >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>>> committed READ WRITE] >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>>> committed READ WRITE] >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>>> committed READ WRITE] >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() >>>>>>>> FROM tt1] >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>>>>>>> >>>>>>>> The cursor currently shows both inserted rows because command id at >>>>>>>> data node in >>>>>>>> step (j) is 0 >>>>>>>> step (k) is 1 & >>>>>>>> step (l) is 2 >>>>>>>> >>>>>>>> Where as we need command ids to be >>>>>>>> >>>>>>>> step (j) should be 0 >>>>>>>> step (k) should be 2 & >>>>>>>> step (l) should be 1 >>>>>>>> >>>>>>>> This will solve the cursor visibility problem. >>>>>>>> >>>>>>>> To implement this I suggest we send command IDs to data nodes from >>>>>>>> the coordinator >>>>>>>> like we send gxid. The only difference will be that we do not need >>>>>>>> to take command IDs >>>>>>>> from GTM since they are only valid with in the transaction. >>>>>>>> >>>>>>>> See this example >>>>>>>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> ------+------+------+------+---- >>>>>>>> (0 rows) >>>>>>>> >>>>>>>> test=# begin; >>>>>>>> BEGIN >>>>>>>> test=# insert into tt1 values(1); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> (1 row) >>>>>>>> >>>>>>>> test=# insert into tt1 values(2); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> (2 rows) >>>>>>>> >>>>>>>> test=# insert into tt1 values(3); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> (3 rows) >>>>>>>> >>>>>>>> test=# insert into tt1 values(4); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>>> (4 rows) >>>>>>>> >>>>>>>> test=# end; >>>>>>>> COMMIT >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>>> (4 rows) >>>>>>>> >>>>>>>> test=# insert into tt1 values(5); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>>>> (5 rows) >>>>>>>> >>>>>>>> test=# insert into tt1 values(6); >>>>>>>> INSERT 0 1 >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>>>> 80617 | 0 | 0 | 0 | 6 >>>>>>>> (6 rows) >>>>>>>> >>>>>>>> Note that at the end of the multi-statement transaction the command >>>>>>>> id gets reset to zero. >>>>>>>> >>>>>>>> -- >>>>>>>> Abbas >>>>>>>> Architect >>>>>>>> EnterpriseDB Corporation >>>>>>>> The Enterprise PostgreSQL Company >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> Abbas >>>>>>> Architect >>>>>>> EnterpriseDB Corporation >>>>>>> The Enterprise PostgreSQL Company >>>>>>> >>>>>>> Phone: 92-334-5100153 >>>>>>> >>>>>>> Website: www.enterprisedb.com >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>>> >>>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>>> the individual or entity to whom it is addressed. This message >>>>>>> contains information from EnterpriseDB Corporation that may be >>>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>>> law. If you are not the intended recipient or authorized to receive >>>>>>> this for the intended recipient, any use, dissemination, >>>>>>> distribution, >>>>>>> retention, archiving, or copying of this communication is strictly >>>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>>> the sender immediately by reply e-mail and delete this message. >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Live Security Virtual Conference >>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>> Discussions >>>>>>> will include endpoint security, mobile security and the latest in >>>>>>> malware >>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>> _______________________________________________ >>>>>>> Postgres-xc-developers mailing list >>>>>>> Pos...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Michael Paquier >>>>>> http://michael.otacoo.com >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>>> >>>>> Phone: 92-334-5100153 >>>>> >>>>> Website: www.enterprisedb.com >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>> >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. >>>>> Discussions >>>>> will include endpoint security, mobile security and the latest in >>>>> malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. > > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Abbas B. <abb...@en...> - 2012-07-18 22:56:48
Attachments:
15_cvp.patch
|
Hi, Attached please find the patch that solves cursor visibility problem as well as the inserting into child by selecting from the parent problem using the design discussed earlier. Test cases are added and regressions are fixed accordingly. Patch is ready for review. Thanks. On Tue, Jul 10, 2012 at 6:54 AM, Koichi Suzuki <koi...@gm...>wrote: > I basically agree on the idea. I also think Michael needs this kind > of extension to feedback trigger target rows. We should design > NoticeResponse extensible enough for various usage. > > Regards; > ---------- > Koichi Suzuki > > > 2012/7/10 Abbas Butt <abb...@en...>: > > Here is the update on the issue. > > > > It was decided that the changes done by the data nodes in the command id > > should be communicated back to the coordinator and that the coordinator > > should choose the largest of all the received values as the next command > id. > > > > It was suggested that we should check that a skipped value of command id > > should not create a problem for the next operations on the table. I have > > verified both by studying the code and by actually changing the function > > CommandCounterIncrement to increment the command id by 3 and running > > regression. It worked fine so a hole in command id is not a problem. > > > > Next it was suggested that we should use the mechanism currently in > place to > > send # of tuples affected by a statement to communicate the changed > command > > id to the coordinator. > > Please refer to this link in the documentation > > http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html > > Note that there is no such message format that exists in the current > over > > the wire protocol to communicate the # of tuples affected by a statement. > > The libpq functions that we might suspect do the same are PQntuples and > > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as > > PQcmdTuples extracts the # of tuples affected from the CommandComplete > 'C' > > message string. We cannot use these mechanisms for our purpose. > > > > I evaluated the use of NoticeResponse 'N' for sending changed command id > but > > the message format of NoticeResponse mandates the use of certain fields > > which will make our sent messages un-necessarily bulky and would consume > > network bandwidth for no reason. > > > > I therefore suggest the we use a new message for communicating XC > specific > > information from data node to coordinator. Currently we will use it for > > command id but we will design the message format flexible enough to > > accommodate future XC requirements. Whenever the data node increments > > command id we will send the information to the coordinator and > > handle_response function in execRemote.c would be changed to accommodate > new > > message. Since coordinators will never use the new message therefore the > > existing clients do not need to bother. > > > > Comments or suggestions are welcome. > > > > Regards > > > > > > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en...> > > wrote: > >> > >> While fixing the regression failures resulting from the changes done by > >> the patch I was able to fix all except this test case > >> > >> set enforce_two_phase_commit = off; > >> > >> CREATE TEMP TABLE users ( > >> id INT PRIMARY KEY, > >> name VARCHAR NOT NULL > >> ) DISTRIBUTE BY REPLICATION; > >> > >> INSERT INTO users VALUES (1, 'Jozko'); > >> INSERT INTO users VALUES (2, 'Ferko'); > >> INSERT INTO users VALUES (3, 'Samko'); > >> CREATE TEMP TABLE tasks ( > >> id INT PRIMARY KEY, > >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, > >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, > >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL > >> ) DISTRIBUTE BY REPLICATION; > >> > >> INSERT INTO tasks VALUES (1,1,NULL,NULL); > >> INSERT INTO tasks VALUES (2,2,2,NULL); > >> INSERT INTO tasks VALUES (3,3,3,3); > >> > >> BEGIN; > >> > >> UPDATE tasks set id=id WHERE id=2; > >> SELECT * FROM tasks; > >> > >> DELETE FROM users WHERE id = 2; > >> SELECT * FROM tasks; > >> > >> COMMIT; > >> > >> The obtained output from the last select statement is > >> > >> id | owner | worker | checked_by > >> ----+-------+--------+------------ > >> 1 | 1 | | > >> 3 | 3 | 3 | 3 > >> 2 | 2 | 2 | > >> (3 rows) > >> > >> where as the expected output is > >> > >> id | owner | worker | checked_by > >> ----+-------+--------+------------ > >> 1 | 1 | | > >> 3 | 3 | 3 | 3 > >> 2 | | | > >> (3 rows) > >> > >> Note that the owner and worker have been set to null due to "ON DELETE > SET > >> NULL". > >> > >> Here is the reason why this does not work properly. Consider the last > >> transaction > >> > >> BEGIN; > >> > >> UPDATE tasks set id=id WHERE id=2; > >> SELECT * FROM tasks; > >> > >> DELETE FROM users WHERE id = 2; > >> SELECT * FROM tasks; > >> > >> COMMIT; > >> > >> Here are the command id values the coordinator sends to the data node > >> > >> 0 for the first update that gets incremented to 1 because this is a DML > >> and needs to consume a command id > >> 1 for the first select that remains 1 since it is not required to be > >> consumed. > >> 1 for the delete statement that gets incremented to 2 because it is a > DML > >> and 2 for the last select. > >> > >> Now this is what happens on the data node > >> > >> When the data node receives the first update with command id 0, it > >> increments it once due to the update itself and once due to the update > run > >> because of "ON UPDATE CASCADE". Hence the command id at the end of > update on > >> data node is 2. > >> The first select comes to data node with command id 1, which is > incorrect. > >> The user's intention is to see data after update and its command id > should > >> be 2. > >> Now delete comes with command id 1, and data node increments it once due > >> to the delete itself and once due to the update run because of "ON > DELETE > >> SET NULL", hence the command id at the end of delete is 3. > >> Coordinator now sends last select with command id 2, which is again > >> incorrect since user's intention is to see data after delete and select > >> should have been sent to data node with command id 3 or 4. > >> > >> Every time data node increments command id due to any statements run > >> implicitly either because of the constraints or triggers, this scheme of > >> sending command ids to data node from coordinator to solve fetch > problems > >> would fail. > >> > >> Datanode can have a trigger e.g. inserting rows thrice on every single > >> insert and would increment command id on every insert. Therefore this > design > >> cannot work. > >> > >> Either we have to synchronize command ids between datanode and > coordinator > >> through GTM > >> OR > >> We will have to send the DECLARE CURSOR down to the datanode. In this > case > >> however we will not be able to send the cursor query as it is because > the > >> query might contain a join on two tables which exist on a disjoint set > of > >> data nodes. > >> > >> Comments or suggestions are welcome. > >> > >> > >> > >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt < > abb...@en...> > >> wrote: > >>> > >>> Thanks for your comments. > >>> > >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat > >>> <ash...@en...> wrote: > >>>> > >>>> Hi Abbas, > >>>> I have few comments to make > >>>> 1. With this patch there are two variables for having command Id, that > >>>> is going to cause confusion and will be a maintenance burden, might > be error > >>>> prone. Is it possible to use a single variable instead of two? > >>> > >>> > >>> Are you talking about receivedCommandId and currentCommandId? If yes, I > >>> would prefer not having a packet received from coordinator overwrite > the > >>> currentCommandId at data node, because I am not 100% sure about the > life > >>> time of currentCommandId, I might overwrite it before time. It would > be safe > >>> to let currentCommandId as is unless we are compelled to get the next > >>> command ID, and have the received command id take priority at that > time. > >>> > >>>> > >>>> Right now there is some code which is specific to cursors in your > patch. > >>>> If you can plug the coordinator command id somehow into > currentCommandId, > >>>> you won't need that code and any other code which needs coordinator > command > >>>> ID will be automatically taken care of. > >>> > >>> > >>> That code is required to solve a problem. Consider this case when a > >>> coordinator received this transaction > >>> > >>> > >>> BEGIN; > >>> insert into tt1 values(1); > >>> declare c50 cursor for select * from tt1; > >>> insert into tt1 values(2); > >>> fetch all from c50; > >>> COMMIT; > >>> > >>> While sending select to the data node in response to a fetch we need to > >>> know what was the command ID of the declare cursor statement and we > need to > >>> send that command ID to the data node for this particular fetch. This > is the > >>> main idea behind this solution. > >>> > >>> The first insert goes to the data node with command id 0, the second > >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When > >>> coordinator sees fetch it needs to send select to the data node with > command > >>> ID 1 rather than 3. > >>> > >>> > >>>> > >>>> 2. A non-transaction on coordinator can spawn tranasactions on > datanode > >>>> or subtransactions (if there is already a transaction running). Does > your > >>>> patch handle that case? > >>> > >>> > >>> No and it does not need to, because that case has no known problems > that > >>> we need to solve. I don't think my patch would impact any such case > but I > >>> will analyze any failures that I may get in regressions. > >>> > >>>> > >>>> Should we do more thorough research in the transaction management, > esp. > >>>> to see the impact of getting same command id for two commands on the > >>>> datanode? > >>> > >>> > >>> If we issue two commands with the same command ID then we will > definitely > >>> have visibility issues according to the rules I have already > explained. But > >>> we will not have two commands sent to the data node with same command > id. > >>> > >>>> > >>>> > >>>> > >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt > >>>> <abb...@en...> wrote: > >>>>> > >>>>> Hi Ashutosh, > >>>>> Here are the results with the val column, Thanks. > >>>>> > >>>>> test=# drop table mvcc_demo; > >>>>> DROP TABLE > >>>>> test=# > >>>>> test=# create table mvcc_demo (val int); > >>>>> CREATE TABLE > >>>>> test=# > >>>>> test=# TRUNCATE mvcc_demo; > >>>>> TRUNCATE TABLE > >>>>> test=# > >>>>> test=# BEGIN; > >>>>> BEGIN > >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that > >>>>> combo id would be different > >>>>> DELETE 0 > >>>>> test=# DELETE FROM mvcc_demo; > >>>>> DELETE 0 > >>>>> test=# DELETE FROM mvcc_demo; > >>>>> DELETE 0 > >>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>> INSERT 0 1 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+------+-----------+------------- > >>>>> 80689 | 0 | 3 | f > >>>>> 80689 | 0 | 4 | f > >>>>> 80689 | 0 | 5 | f > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80689 | 0 | 3 | 3 | 1 > >>>>> 80689 | 0 | 4 | 4 | 2 > >>>>> 80689 | 0 | 5 | 5 | 3 > >>>>> > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# DELETE FROM mvcc_demo; > >>>>> DELETE 3 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+-------+-----------+------------- > >>>>> 80689 | 80689 | 0 | t > >>>>> 80689 | 80689 | 1 | t > >>>>> 80689 | 80689 | 2 | t > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> ------+------+------+------+----- > >>>>> (0 rows) > >>>>> > >>>>> > >>>>> test=# > >>>>> test=# END; > >>>>> COMMIT > >>>>> test=# > >>>>> test=# > >>>>> test=# TRUNCATE mvcc_demo; > >>>>> TRUNCATE TABLE > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> test=# BEGIN; > >>>>> BEGIN > >>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>> INSERT 0 1 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+------+-----------+------------- > >>>>> 80693 | 0 | 0 | f > >>>>> 80693 | 0 | 1 | f > >>>>> 80693 | 0 | 2 | f > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80693 | 0 | 0 | 0 | 1 > >>>>> 80693 | 0 | 1 | 1 | 2 > >>>>> 80693 | 0 | 2 | 2 | 3 > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# UPDATE mvcc_demo SET val = 10; > >>>>> > >>>>> UPDATE 3 > >>>>> test=# > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+-------+-----------+------------- > >>>>> 80693 | 80693 | 0 | t > >>>>> 80693 | 80693 | 1 | t > >>>>> 80693 | 80693 | 2 | t > >>>>> 80693 | 0 | 3 | f > >>>>> 80693 | 0 | 3 | f > >>>>> 80693 | 0 | 3 | f > >>>>> (6 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80693 | 0 | 3 | 3 | 10 > >>>>> 80693 | 0 | 3 | 3 | 10 > >>>>> 80693 | 0 | 3 | 3 | 10 > >>>>> (3 rows) > >>>>> > >>>>> > >>>>> test=# > >>>>> test=# END; > >>>>> COMMIT > >>>>> test=# > >>>>> test=# TRUNCATE mvcc_demo; > >>>>> TRUNCATE TABLE > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- From one psql issue > >>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>> INSERT 0 1 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+------+-----------+------------- > >>>>> 80699 | 0 | 0 | f > >>>>> (1 row) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80699 | 0 | 0 | 0 | 1 > >>>>> (1 row) > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> test=# -- From another issue > >>>>> test=# BEGIN; > >>>>> BEGIN > >>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (4); > >>>>> INSERT 0 1 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+------+-----------+------------- > >>>>> 80699 | 0 | 0 | f > >>>>> 80700 | 0 | 0 | f > >>>>> 80700 | 0 | 1 | f > >>>>> 80700 | 0 | 2 | f > >>>>> (4 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80699 | 0 | 0 | 0 | 1 > >>>>> 80700 | 0 | 0 | 0 | 2 > >>>>> 80700 | 0 | 1 | 1 | 3 > >>>>> 80700 | 0 | 2 | 2 | 4 > >>>>> (4 rows) > >>>>> > >>>>> test=# > >>>>> test=# UPDATE mvcc_demo SET val = 10; > >>>>> > >>>>> UPDATE 4 > >>>>> test=# > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+-------+-----------+------------- > >>>>> 80700 | 80700 | 0 | t > >>>>> 80700 | 80700 | 1 | t > >>>>> 80700 | 80700 | 2 | t > >>>>> 80699 | 80700 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> (8 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80700 | 0 | 3 | 3 | 10 > >>>>> 80700 | 0 | 3 | 3 | 10 > >>>>> 80700 | 0 | 3 | 3 | 10 > >>>>> 80700 | 0 | 3 | 3 | 10 > >>>>> (4 rows) > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> test=# -- Before finishing this, issue these from the first psql > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+-------+-----------+------------- > >>>>> 80700 | 80700 | 0 | t > >>>>> 80700 | 80700 | 1 | t > >>>>> 80700 | 80700 | 2 | t > >>>>> 80699 | 80700 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> (8 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+-------+------+------+----- > >>>>> 80699 | 80700 | 3 | 3 | 1 > >>>>> (1 row) > >>>>> > >>>>> test=# end; > >>>>> COMMIT > >>>>> > >>>>> > >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier > >>>>> <mic...@gm...> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, > so > >>>>>> be sure to send it to remote Datanodes really only if necessary. > >>>>>> You should put more severe conditions blocking this function cid can > >>>>>> easily get incremented in Postgres. > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt > >>>>>> <abb...@en...> wrote: > >>>>>>> > >>>>>>> PFA a WIP patch implementing the design presented earlier. > >>>>>>> The patch is WIP because it still has and FIXME and it shows some > >>>>>>> regression failures that need to be fixed, but other than that it > confirms > >>>>>>> that the suggested design would work fine. The following test > cases now work > >>>>>>> fine > >>>>>>> > >>>>>>> drop table tt1; > >>>>>>> create table tt1(f1 int) distribute by replication; > >>>>>>> > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> insert into tt1 values(1); > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> insert into tt1 values(2); > >>>>>>> fetch all from c50; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> insert into tt1 values(1); > >>>>>>> > >>>>>>> insert into tt1 values(2); > >>>>>>> fetch all from c50; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> insert into tt1 values(1); > >>>>>>> insert into tt1 values(2); > >>>>>>> > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> insert into tt1 values(3); > >>>>>>> > >>>>>>> fetch all from c50; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> insert into tt1 values(1); > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> insert into tt1 values(2); > >>>>>>> declare c51 cursor for select * from tt1; > >>>>>>> insert into tt1 values(3); > >>>>>>> fetch all from c50; > >>>>>>> fetch all from c51; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> insert into tt1 values(1); > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> declare c51 cursor for select * from tt1; > >>>>>>> insert into tt1 values(2); > >>>>>>> insert into tt1 values(3); > >>>>>>> fetch all from c50; > >>>>>>> fetch all from c51; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt > >>>>>>> <abb...@en...> wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> In a multi-statement transaction each statement is given a command > >>>>>>>> identifier > >>>>>>>> starting from zero and incrementing for each statement. > >>>>>>>> These command indentifers are required for extra tracking because > >>>>>>>> each > >>>>>>>> statement has its own visibility rules with in the transaction. > >>>>>>>> For example, a cursor’s contents must remain unchanged even if > later > >>>>>>>> statements in the > >>>>>>>> same transaction modify rows. Such tracking is implemented using > >>>>>>>> system command id > >>>>>>>> columns cmin/cmax, which is internally actually is a single > column. > >>>>>>>> > >>>>>>>> cmin/cmax come into play in case of multi-statement transactions > >>>>>>>> only, > >>>>>>>> they are both zero otherwise. > >>>>>>>> > >>>>>>>> cmin "The command identifier of the statement within the inserting > >>>>>>>> transaction." > >>>>>>>> cmax "The command identifier of the statement within the deleting > >>>>>>>> transaction." > >>>>>>>> > >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) > >>>>>>>> > >>>>>>>> ( // A heap tuple is valid > >>>>>>>> "now" iff > >>>>>>>> Xmin == my-transaction && // inserted by the > current > >>>>>>>> transaction > >>>>>>>> Cmin < my-command && // before this command, > and > >>>>>>>> ( > >>>>>>>> Xmax is null || // the row has not been > >>>>>>>> deleted, or > >>>>>>>> ( > >>>>>>>> Xmax == my-transaction && // it was deleted by the > >>>>>>>> current transaction > >>>>>>>> Cmax >= my-command // but not before this > >>>>>>>> command, > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> || // or > >>>>>>>> ( > >>>>>>>> Xmin is committed && // the row was inserted > by > >>>>>>>> a committed transaction, and > >>>>>>>> ( > >>>>>>>> Xmax is null || // the row has not been > >>>>>>>> deleted, or > >>>>>>>> ( > >>>>>>>> Xmax == my-transaction && // the row is being > deleted > >>>>>>>> by this transaction > >>>>>>>> Cmax >= my-command) || // but it's not deleted > >>>>>>>> "yet", or > >>>>>>>> ( > >>>>>>>> Xmax != my-transaction && // the row was deleted by > >>>>>>>> another transaction > >>>>>>>> Xmax is not committed // that has not been > >>>>>>>> committed > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> > >>>>>>>> Because cmin and cmax are internally a single system column, > >>>>>>>> it is therefore not possible to simply record the status of a row > >>>>>>>> that is created and expired in the same multi-statement > transaction. > >>>>>>>> For that reason, a special combo command id is created that > >>>>>>>> references > >>>>>>>> a local memory hash that contains the actual cmin and cmax values. > >>>>>>>> It means that if combo id is being used the number we are seeing > >>>>>>>> would not be the cmin or cmax it will be an index into a local > >>>>>>>> array that contains a structure with has the actual cmin and cmax > >>>>>>>> values. > >>>>>>>> > >>>>>>>> The following queries (taken mostly from > >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) > >>>>>>>> use the contrib module pageinspect, which allows > >>>>>>>> visibility of internal heap page structures and all stored rows, > >>>>>>>> including those not visible in the current snapshot. > >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) > >>>>>>>> > >>>>>>>> We are exploring 3 examples here: > >>>>>>>> 1) INSERT & DELETE in a single transaction > >>>>>>>> 2) INSERT & UPDATE in a single transaction > >>>>>>>> 3) INSERT from two different transactions & UPDATE from one > >>>>>>>> > >>>>>>>> test=# drop table mvcc_demo; > >>>>>>>> DROP TABLE > >>>>>>>> test=# > >>>>>>>> test=# create table mvcc_demo (val int); > >>>>>>>> CREATE TABLE > >>>>>>>> test=# > >>>>>>>> test=# TRUNCATE mvcc_demo; > >>>>>>>> TRUNCATE TABLE > >>>>>>>> test=# > >>>>>>>> test=# BEGIN; > >>>>>>>> BEGIN > >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that > >>>>>>>> combo id would be different > >>>>>>>> DELETE 0 > >>>>>>>> test=# DELETE FROM mvcc_demo; > >>>>>>>> DELETE 0 > >>>>>>>> test=# DELETE FROM mvcc_demo; > >>>>>>>> DELETE 0 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+------+-----------+------------- > >>>>>>>> 80685 | 0 | 3 | f > >>>>>>>> 80685 | 0 | 4 | f > >>>>>>>> 80685 | 0 | 5 | f > >>>>>>>> (3 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# DELETE FROM mvcc_demo; > >>>>>>>> DELETE 3 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+-------+-----------+------------- > >>>>>>>> 80685 | 80685 | 0 | t > >>>>>>>> 80685 | 80685 | 1 | t > >>>>>>>> 80685 | 80685 | 2 | t > >>>>>>>> (3 rows) > >>>>>>>> > >>>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax > >>>>>>>> they are actually > >>>>>>>> the indexes of the internal array already explained above. > >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 > >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 > >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# END; > >>>>>>>> COMMIT > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# TRUNCATE mvcc_demo; > >>>>>>>> TRUNCATE TABLE > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# BEGIN; > >>>>>>>> BEGIN > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+------+-----------+------------- > >>>>>>>> 80675 | 0 | 0 | f > >>>>>>>> 80675 | 0 | 1 | f > >>>>>>>> 80675 | 0 | 2 | f > >>>>>>>> (3 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; > >>>>>>>> UPDATE 3 > >>>>>>>> test=# > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+-------+-----------+------------- > >>>>>>>> 80675 | 80675 | 0 | t > >>>>>>>> 80675 | 80675 | 1 | t > >>>>>>>> 80675 | 80675 | 2 | t > >>>>>>>> 80675 | 0 | 3 | f > >>>>>>>> 80675 | 0 | 3 | f > >>>>>>>> 80675 | 0 | 3 | f > >>>>>>>> (6 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# END; > >>>>>>>> COMMIT > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# TRUNCATE mvcc_demo; > >>>>>>>> TRUNCATE TABLE > >>>>>>>> test=# > >>>>>>>> > >>>>>>>> -- From one psql issue > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool > AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+------+-----------+------------- > >>>>>>>> 80677 | 0 | 0 | f > >>>>>>>> (1 row) > >>>>>>>> > >>>>>>>> > >>>>>>>> test=# -- From another issue > >>>>>>>> test=# BEGIN; > >>>>>>>> BEGIN > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+------+-----------+------------- > >>>>>>>> 80677 | 0 | 0 | f > >>>>>>>> 80678 | 0 | 0 | f > >>>>>>>> 80678 | 0 | 1 | f > >>>>>>>> 80678 | 0 | 2 | f > >>>>>>>> (4 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; > >>>>>>>> UPDATE 4 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+-------+-----------+------------- > >>>>>>>> 80678 | 80678 | 0 | t > >>>>>>>> 80678 | 80678 | 1 | t > >>>>>>>> 80678 | 80678 | 2 | t > >>>>>>>> 80677 | 80678 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> (8 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> > >>>>>>>> test=# -- Before finishing this, issue these from the first > psql > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool > AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+-------+-----------+------------- > >>>>>>>> 80678 | 80678 | 0 | t > >>>>>>>> 80678 | 80678 | 1 | t > >>>>>>>> 80678 | 80678 | 2 | t > >>>>>>>> 80677 | 80678 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> (8 rows) > >>>>>>>> > >>>>>>>> test=# END; > >>>>>>>> COMMIT > >>>>>>>> > >>>>>>>> > >>>>>>>> Now consider the case we are trying to solve > >>>>>>>> > >>>>>>>> drop table tt1; > >>>>>>>> create table tt1(f1 int); > >>>>>>>> > >>>>>>>> BEGIN; > >>>>>>>> insert into tt1 values(1); > >>>>>>>> declare c50 cursor for select * from tt1; -- should show one row > >>>>>>>> only > >>>>>>>> insert into tt1 values(2); > >>>>>>>> fetch all from c50; > >>>>>>>> COMMIT; > >>>>>>>> > >>>>>>>> > >>>>>>>> Consider Data node 1 log > >>>>>>>> > >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL > read > >>>>>>>> committed READ WRITE] > >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] > >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] > >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] > >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL > read > >>>>>>>> committed READ WRITE] > >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] > >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] > >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] > >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL > read > >>>>>>>> committed READ WRITE] > >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] > >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] > >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() > >>>>>>>> FROM tt1] > >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] > >>>>>>>> > >>>>>>>> The cursor currently shows both inserted rows because command id > at > >>>>>>>> data node in > >>>>>>>> step (j) is 0 > >>>>>>>> step (k) is 1 & > >>>>>>>> step (l) is 2 > >>>>>>>> > >>>>>>>> Where as we need command ids to be > >>>>>>>> > >>>>>>>> step (j) should be 0 > >>>>>>>> step (k) should be 2 & > >>>>>>>> step (l) should be 1 > >>>>>>>> > >>>>>>>> This will solve the cursor visibility problem. > >>>>>>>> > >>>>>>>> To implement this I suggest we send command IDs to data nodes from > >>>>>>>> the coordinator > >>>>>>>> like we send gxid. The only difference will be that we do not need > >>>>>>>> to take command IDs > >>>>>>>> from GTM since they are only valid with in the transaction. > >>>>>>>> > >>>>>>>> See this example > >>>>>>>> > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> ------+------+------+------+---- > >>>>>>>> (0 rows) > >>>>>>>> > >>>>>>>> test=# begin; > >>>>>>>> BEGIN > >>>>>>>> test=# insert into tt1 values(1); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> (1 row) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(2); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> (2 rows) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(3); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> (3 rows) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(4); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> 80615 | 0 | 3 | 3 | 4 > >>>>>>>> (4 rows) > >>>>>>>> > >>>>>>>> test=# end; > >>>>>>>> COMMIT > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> 80615 | 0 | 3 | 3 | 4 > >>>>>>>> (4 rows) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(5); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> 80615 | 0 | 3 | 3 | 4 > >>>>>>>> 80616 | 0 | 0 | 0 | 5 > >>>>>>>> (5 rows) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(6); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> 80615 | 0 | 3 | 3 | 4 > >>>>>>>> 80616 | 0 | 0 | 0 | 5 > >>>>>>>> 80617 | 0 | 0 | 0 | 6 > >>>>>>>> (6 rows) > >>>>>>>> > >>>>>>>> Note that at the end of the multi-statement transaction the > command > >>>>>>>> id gets reset to zero. > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Abbas > >>>>>>>> Architect > >>>>>>>> EnterpriseDB Corporation > >>>>>>>> The Enterprise PostgreSQL Company > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> -- > >>>>>>> Abbas > >>>>>>> Architect > >>>>>>> EnterpriseDB Corporation > >>>>>>> The Enterprise PostgreSQL Company > >>>>>>> > >>>>>>> Phone: 92-334-5100153 > >>>>>>> > >>>>>>> Website: www.enterprisedb.com > >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ > >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb > >>>>>>> > >>>>>>> This e-mail message (and any attachment) is intended for the use of > >>>>>>> the individual or entity to whom it is addressed. This message > >>>>>>> contains information from EnterpriseDB Corporation that may be > >>>>>>> privileged, confidential, or exempt from disclosure under > applicable > >>>>>>> law. If you are not the intended recipient or authorized to receive > >>>>>>> this for the intended recipient, any use, dissemination, > >>>>>>> distribution, > >>>>>>> retention, archiving, or copying of this communication is strictly > >>>>>>> prohibited. If you have received this e-mail in error, please > notify > >>>>>>> the sender immediately by reply e-mail and delete this message. > >>>>>>> > >>>>>>> > >>>>>>> > ------------------------------------------------------------------------------ > >>>>>>> Live Security Virtual Conference > >>>>>>> Exclusive live event will cover all the ways today's security and > >>>>>>> threat landscape has changed and how IT managers can respond. > >>>>>>> Discussions > >>>>>>> will include endpoint security, mobile security and the latest in > >>>>>>> malware > >>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >>>>>>> _______________________________________________ > >>>>>>> Postgres-xc-developers mailing list > >>>>>>> Pos...@li... > >>>>>>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Michael Paquier > >>>>>> http://michael.otacoo.com > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> -- > >>>>> Abbas > >>>>> Architect > >>>>> EnterpriseDB Corporation > >>>>> The Enterprise PostgreSQL Company > >>>>> > >>>>> Phone: 92-334-5100153 > >>>>> > >>>>> Website: www.enterprisedb.com > >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ > >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb > >>>>> > >>>>> This e-mail message (and any attachment) is intended for the use of > >>>>> the individual or entity to whom it is addressed. This message > >>>>> contains information from EnterpriseDB Corporation that may be > >>>>> privileged, confidential, or exempt from disclosure under applicable > >>>>> law. If you are not the intended recipient or authorized to receive > >>>>> this for the intended recipient, any use, dissemination, > distribution, > >>>>> retention, archiving, or copying of this communication is strictly > >>>>> prohibited. If you have received this e-mail in error, please notify > >>>>> the sender immediately by reply e-mail and delete this message. > >>>>> > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>> Live Security Virtual Conference > >>>>> Exclusive live event will cover all the ways today's security and > >>>>> threat landscape has changed and how IT managers can respond. > >>>>> Discussions > >>>>> will include endpoint security, mobile security and the latest in > >>>>> malware > >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >>>>> _______________________________________________ > >>>>> Postgres-xc-developers mailing list > >>>>> Pos...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Best Wishes, > >>>> Ashutosh Bapat > >>>> EntepriseDB Corporation > >>>> The Enterprise Postgres Company > >>>> > >>> > >>> > >>> > >>> -- > >>> -- > >>> Abbas > >>> Architect > >>> EnterpriseDB Corporation > >>> The Enterprise PostgreSQL Company > >>> > >>> Phone: 92-334-5100153 > >>> > >>> Website: www.enterprisedb.com > >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ > >>> Follow us on Twitter: http://www.twitter.com/enterprisedb > >>> > >>> This e-mail message (and any attachment) is intended for the use of > >>> the individual or entity to whom it is addressed. This message > >>> contains information from EnterpriseDB Corporation that may be > >>> privileged, confidential, or exempt from disclosure under applicable > >>> law. If you are not the intended recipient or authorized to receive > >>> this for the intended recipient, any use, dissemination, distribution, > >>> retention, archiving, or copying of this communication is strictly > >>> prohibited. If you have received this e-mail in error, please notify > >>> the sender immediately by reply e-mail and delete this message. > >> > >> > >> > >> > >> -- > >> -- > >> Abbas > >> Architect > >> EnterpriseDB Corporation > >> The Enterprise PostgreSQL Company > >> > >> Phone: 92-334-5100153 > >> > >> Website: www.enterprisedb.com > >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ > >> Follow us on Twitter: http://www.twitter.com/enterprisedb > >> > >> This e-mail message (and any attachment) is intended for the use of > >> the individual or entity to whom it is addressed. This message > >> contains information from EnterpriseDB Corporation that may be > >> privileged, confidential, or exempt from disclosure under applicable > >> law. If you are not the intended recipient or authorized to receive > >> this for the intended recipient, any use, dissemination, distribution, > >> retention, archiving, or copying of this communication is strictly > >> prohibited. If you have received this e-mail in error, please notify > >> the sender immediately by reply e-mail and delete this message. > > > > > > > > > > -- > > -- > > Abbas > > Architect > > EnterpriseDB Corporation > > The Enterprise PostgreSQL Company > > > > Phone: 92-334-5100153 > > > > Website: www.enterprisedb.com > > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > > Follow us on Twitter: http://www.twitter.com/enterprisedb > > > > This e-mail message (and any attachment) is intended for the use of > > the individual or entity to whom it is addressed. This message > > contains information from EnterpriseDB Corporation that may be > > privileged, confidential, or exempt from disclosure under applicable > > law. If you are not the intended recipient or authorized to receive > > this for the intended recipient, any use, dissemination, distribution, > > retention, archiving, or copying of this communication is strictly > > prohibited. If you have received this e-mail in error, please notify > > the sender immediately by reply e-mail and delete this message. > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Ashutosh B. <ash...@en...> - 2012-07-19 04:21:06
|
Hi Abbas, You have mentioned "design discussion earlier". Do we have that in some mail. Even better if you could put that down here in writing for ready reference. On Thu, Jul 19, 2012 at 4:26 AM, Abbas Butt <abb...@en...>wrote: > Hi, > Attached please find the patch that solves cursor visibility problem as > well as the inserting into child by selecting from the parent problem using > the design discussed earlier. > Test cases are added and regressions are fixed accordingly. > Patch is ready for review. > > Thanks. > > > On Tue, Jul 10, 2012 at 6:54 AM, Koichi Suzuki <koi...@gm...>wrote: > >> I basically agree on the idea. I also think Michael needs this kind >> of extension to feedback trigger target rows. We should design >> NoticeResponse extensible enough for various usage. >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/7/10 Abbas Butt <abb...@en...>: >> > Here is the update on the issue. >> > >> > It was decided that the changes done by the data nodes in the command id >> > should be communicated back to the coordinator and that the coordinator >> > should choose the largest of all the received values as the next >> command id. >> > >> > It was suggested that we should check that a skipped value of command id >> > should not create a problem for the next operations on the table. I have >> > verified both by studying the code and by actually changing the function >> > CommandCounterIncrement to increment the command id by 3 and running >> > regression. It worked fine so a hole in command id is not a problem. >> > >> > Next it was suggested that we should use the mechanism currently in >> place to >> > send # of tuples affected by a statement to communicate the changed >> command >> > id to the coordinator. >> > Please refer to this link in the documentation >> > http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html >> > Note that there is no such message format that exists in the current >> over >> > the wire protocol to communicate the # of tuples affected by a >> statement. >> > The libpq functions that we might suspect do the same are PQntuples and >> > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as >> > PQcmdTuples extracts the # of tuples affected from the CommandComplete >> 'C' >> > message string. We cannot use these mechanisms for our purpose. >> > >> > I evaluated the use of NoticeResponse 'N' for sending changed command >> id but >> > the message format of NoticeResponse mandates the use of certain fields >> > which will make our sent messages un-necessarily bulky and would consume >> > network bandwidth for no reason. >> > >> > I therefore suggest the we use a new message for communicating XC >> specific >> > information from data node to coordinator. Currently we will use it for >> > command id but we will design the message format flexible enough to >> > accommodate future XC requirements. Whenever the data node increments >> > command id we will send the information to the coordinator and >> > handle_response function in execRemote.c would be changed to >> accommodate new >> > message. Since coordinators will never use the new message therefore the >> > existing clients do not need to bother. >> > >> > Comments or suggestions are welcome. >> > >> > Regards >> > >> > >> > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en... >> > >> > wrote: >> >> >> >> While fixing the regression failures resulting from the changes done by >> >> the patch I was able to fix all except this test case >> >> >> >> set enforce_two_phase_commit = off; >> >> >> >> CREATE TEMP TABLE users ( >> >> id INT PRIMARY KEY, >> >> name VARCHAR NOT NULL >> >> ) DISTRIBUTE BY REPLICATION; >> >> >> >> INSERT INTO users VALUES (1, 'Jozko'); >> >> INSERT INTO users VALUES (2, 'Ferko'); >> >> INSERT INTO users VALUES (3, 'Samko'); >> >> CREATE TEMP TABLE tasks ( >> >> id INT PRIMARY KEY, >> >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL >> >> ) DISTRIBUTE BY REPLICATION; >> >> >> >> INSERT INTO tasks VALUES (1,1,NULL,NULL); >> >> INSERT INTO tasks VALUES (2,2,2,NULL); >> >> INSERT INTO tasks VALUES (3,3,3,3); >> >> >> >> BEGIN; >> >> >> >> UPDATE tasks set id=id WHERE id=2; >> >> SELECT * FROM tasks; >> >> >> >> DELETE FROM users WHERE id = 2; >> >> SELECT * FROM tasks; >> >> >> >> COMMIT; >> >> >> >> The obtained output from the last select statement is >> >> >> >> id | owner | worker | checked_by >> >> ----+-------+--------+------------ >> >> 1 | 1 | | >> >> 3 | 3 | 3 | 3 >> >> 2 | 2 | 2 | >> >> (3 rows) >> >> >> >> where as the expected output is >> >> >> >> id | owner | worker | checked_by >> >> ----+-------+--------+------------ >> >> 1 | 1 | | >> >> 3 | 3 | 3 | 3 >> >> 2 | | | >> >> (3 rows) >> >> >> >> Note that the owner and worker have been set to null due to "ON DELETE >> SET >> >> NULL". >> >> >> >> Here is the reason why this does not work properly. Consider the last >> >> transaction >> >> >> >> BEGIN; >> >> >> >> UPDATE tasks set id=id WHERE id=2; >> >> SELECT * FROM tasks; >> >> >> >> DELETE FROM users WHERE id = 2; >> >> SELECT * FROM tasks; >> >> >> >> COMMIT; >> >> >> >> Here are the command id values the coordinator sends to the data node >> >> >> >> 0 for the first update that gets incremented to 1 because this is a DML >> >> and needs to consume a command id >> >> 1 for the first select that remains 1 since it is not required to be >> >> consumed. >> >> 1 for the delete statement that gets incremented to 2 because it is a >> DML >> >> and 2 for the last select. >> >> >> >> Now this is what happens on the data node >> >> >> >> When the data node receives the first update with command id 0, it >> >> increments it once due to the update itself and once due to the update >> run >> >> because of "ON UPDATE CASCADE". Hence the command id at the end of >> update on >> >> data node is 2. >> >> The first select comes to data node with command id 1, which is >> incorrect. >> >> The user's intention is to see data after update and its command id >> should >> >> be 2. >> >> Now delete comes with command id 1, and data node increments it once >> due >> >> to the delete itself and once due to the update run because of "ON >> DELETE >> >> SET NULL", hence the command id at the end of delete is 3. >> >> Coordinator now sends last select with command id 2, which is again >> >> incorrect since user's intention is to see data after delete and select >> >> should have been sent to data node with command id 3 or 4. >> >> >> >> Every time data node increments command id due to any statements run >> >> implicitly either because of the constraints or triggers, this scheme >> of >> >> sending command ids to data node from coordinator to solve fetch >> problems >> >> would fail. >> >> >> >> Datanode can have a trigger e.g. inserting rows thrice on every single >> >> insert and would increment command id on every insert. Therefore this >> design >> >> cannot work. >> >> >> >> Either we have to synchronize command ids between datanode and >> coordinator >> >> through GTM >> >> OR >> >> We will have to send the DECLARE CURSOR down to the datanode. In this >> case >> >> however we will not be able to send the cursor query as it is because >> the >> >> query might contain a join on two tables which exist on a disjoint set >> of >> >> data nodes. >> >> >> >> Comments or suggestions are welcome. >> >> >> >> >> >> >> >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt < >> abb...@en...> >> >> wrote: >> >>> >> >>> Thanks for your comments. >> >>> >> >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat >> >>> <ash...@en...> wrote: >> >>>> >> >>>> Hi Abbas, >> >>>> I have few comments to make >> >>>> 1. With this patch there are two variables for having command Id, >> that >> >>>> is going to cause confusion and will be a maintenance burden, might >> be error >> >>>> prone. Is it possible to use a single variable instead of two? >> >>> >> >>> >> >>> Are you talking about receivedCommandId and currentCommandId? If yes, >> I >> >>> would prefer not having a packet received from coordinator overwrite >> the >> >>> currentCommandId at data node, because I am not 100% sure about the >> life >> >>> time of currentCommandId, I might overwrite it before time. It would >> be safe >> >>> to let currentCommandId as is unless we are compelled to get the next >> >>> command ID, and have the received command id take priority at that >> time. >> >>> >> >>>> >> >>>> Right now there is some code which is specific to cursors in your >> patch. >> >>>> If you can plug the coordinator command id somehow into >> currentCommandId, >> >>>> you won't need that code and any other code which needs coordinator >> command >> >>>> ID will be automatically taken care of. >> >>> >> >>> >> >>> That code is required to solve a problem. Consider this case when a >> >>> coordinator received this transaction >> >>> >> >>> >> >>> BEGIN; >> >>> insert into tt1 values(1); >> >>> declare c50 cursor for select * from tt1; >> >>> insert into tt1 values(2); >> >>> fetch all from c50; >> >>> COMMIT; >> >>> >> >>> While sending select to the data node in response to a fetch we need >> to >> >>> know what was the command ID of the declare cursor statement and we >> need to >> >>> send that command ID to the data node for this particular fetch. This >> is the >> >>> main idea behind this solution. >> >>> >> >>> The first insert goes to the data node with command id 0, the second >> >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When >> >>> coordinator sees fetch it needs to send select to the data node with >> command >> >>> ID 1 rather than 3. >> >>> >> >>> >> >>>> >> >>>> 2. A non-transaction on coordinator can spawn tranasactions on >> datanode >> >>>> or subtransactions (if there is already a transaction running). Does >> your >> >>>> patch handle that case? >> >>> >> >>> >> >>> No and it does not need to, because that case has no known problems >> that >> >>> we need to solve. I don't think my patch would impact any such case >> but I >> >>> will analyze any failures that I may get in regressions. >> >>> >> >>>> >> >>>> Should we do more thorough research in the transaction management, >> esp. >> >>>> to see the impact of getting same command id for two commands on the >> >>>> datanode? >> >>> >> >>> >> >>> If we issue two commands with the same command ID then we will >> definitely >> >>> have visibility issues according to the rules I have already >> explained. But >> >>> we will not have two commands sent to the data node with same command >> id. >> >>> >> >>>> >> >>>> >> >>>> >> >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt >> >>>> <abb...@en...> wrote: >> >>>>> >> >>>>> Hi Ashutosh, >> >>>>> Here are the results with the val column, Thanks. >> >>>>> >> >>>>> test=# drop table mvcc_demo; >> >>>>> DROP TABLE >> >>>>> test=# >> >>>>> test=# create table mvcc_demo (val int); >> >>>>> CREATE TABLE >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> test=# >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >> >>>>> combo id would be different >> >>>>> DELETE 0 >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 0 >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 0 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80689 | 0 | 3 | f >> >>>>> 80689 | 0 | 4 | f >> >>>>> 80689 | 0 | 5 | f >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80689 | 0 | 3 | 3 | 1 >> >>>>> 80689 | 0 | 4 | 4 | 2 >> >>>>> 80689 | 0 | 5 | 5 | 3 >> >>>>> >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 3 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80689 | 80689 | 0 | t >> >>>>> 80689 | 80689 | 1 | t >> >>>>> 80689 | 80689 | 2 | t >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> ------+------+------+------+----- >> >>>>> (0 rows) >> >>>>> >> >>>>> >> >>>>> test=# >> >>>>> test=# END; >> >>>>> COMMIT >> >>>>> test=# >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80693 | 0 | 0 | f >> >>>>> 80693 | 0 | 1 | f >> >>>>> 80693 | 0 | 2 | f >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80693 | 0 | 0 | 0 | 1 >> >>>>> 80693 | 0 | 1 | 1 | 2 >> >>>>> 80693 | 0 | 2 | 2 | 3 >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# UPDATE mvcc_demo SET val = 10; >> >>>>> >> >>>>> UPDATE 3 >> >>>>> test=# >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80693 | 80693 | 0 | t >> >>>>> 80693 | 80693 | 1 | t >> >>>>> 80693 | 80693 | 2 | t >> >>>>> 80693 | 0 | 3 | f >> >>>>> 80693 | 0 | 3 | f >> >>>>> 80693 | 0 | 3 | f >> >>>>> (6 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> (3 rows) >> >>>>> >> >>>>> >> >>>>> test=# >> >>>>> test=# END; >> >>>>> COMMIT >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- From one psql issue >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80699 | 0 | 0 | f >> >>>>> (1 row) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80699 | 0 | 0 | 0 | 1 >> >>>>> (1 row) >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# -- From another issue >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80699 | 0 | 0 | f >> >>>>> 80700 | 0 | 0 | f >> >>>>> 80700 | 0 | 1 | f >> >>>>> 80700 | 0 | 2 | f >> >>>>> (4 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80699 | 0 | 0 | 0 | 1 >> >>>>> 80700 | 0 | 0 | 0 | 2 >> >>>>> 80700 | 0 | 1 | 1 | 3 >> >>>>> 80700 | 0 | 2 | 2 | 4 >> >>>>> (4 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# UPDATE mvcc_demo SET val = 10; >> >>>>> >> >>>>> UPDATE 4 >> >>>>> test=# >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80700 | 80700 | 0 | t >> >>>>> 80700 | 80700 | 1 | t >> >>>>> 80700 | 80700 | 2 | t >> >>>>> 80699 | 80700 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> (8 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> (4 rows) >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# -- Before finishing this, issue these from the first psql >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80700 | 80700 | 0 | t >> >>>>> 80700 | 80700 | 1 | t >> >>>>> 80700 | 80700 | 2 | t >> >>>>> 80699 | 80700 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> (8 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+-------+------+------+----- >> >>>>> 80699 | 80700 | 3 | 3 | 1 >> >>>>> (1 row) >> >>>>> >> >>>>> test=# end; >> >>>>> COMMIT >> >>>>> >> >>>>> >> >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier >> >>>>> <mic...@gm...> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, >> so >> >>>>>> be sure to send it to remote Datanodes really only if necessary. >> >>>>>> You should put more severe conditions blocking this function cid >> can >> >>>>>> easily get incremented in Postgres. >> >>>>>> >> >>>>>> Regards, >> >>>>>> >> >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt >> >>>>>> <abb...@en...> wrote: >> >>>>>>> >> >>>>>>> PFA a WIP patch implementing the design presented earlier. >> >>>>>>> The patch is WIP because it still has and FIXME and it shows some >> >>>>>>> regression failures that need to be fixed, but other than that it >> confirms >> >>>>>>> that the suggested design would work fine. The following test >> cases now work >> >>>>>>> fine >> >>>>>>> >> >>>>>>> drop table tt1; >> >>>>>>> create table tt1(f1 int) distribute by replication; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> >> >>>>>>> insert into tt1 values(2); >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> insert into tt1 values(2); >> >>>>>>> >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(3); >> >>>>>>> >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> declare c51 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(3); >> >>>>>>> fetch all from c50; >> >>>>>>> fetch all from c51; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> declare c51 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> insert into tt1 values(3); >> >>>>>>> fetch all from c50; >> >>>>>>> fetch all from c51; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt >> >>>>>>> <abb...@en...> wrote: >> >>>>>>>> >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> In a multi-statement transaction each statement is given a >> command >> >>>>>>>> identifier >> >>>>>>>> starting from zero and incrementing for each statement. >> >>>>>>>> These command indentifers are required for extra tracking because >> >>>>>>>> each >> >>>>>>>> statement has its own visibility rules with in the transaction. >> >>>>>>>> For example, a cursor’s contents must remain unchanged even if >> later >> >>>>>>>> statements in the >> >>>>>>>> same transaction modify rows. Such tracking is implemented using >> >>>>>>>> system command id >> >>>>>>>> columns cmin/cmax, which is internally actually is a single >> column. >> >>>>>>>> >> >>>>>>>> cmin/cmax come into play in case of multi-statement transactions >> >>>>>>>> only, >> >>>>>>>> they are both zero otherwise. >> >>>>>>>> >> >>>>>>>> cmin "The command identifier of the statement within the >> inserting >> >>>>>>>> transaction." >> >>>>>>>> cmax "The command identifier of the statement within the deleting >> >>>>>>>> transaction." >> >>>>>>>> >> >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) >> >>>>>>>> >> >>>>>>>> ( // A heap tuple is valid >> >>>>>>>> "now" iff >> >>>>>>>> Xmin == my-transaction && // inserted by the >> current >> >>>>>>>> transaction >> >>>>>>>> Cmin < my-command && // before this command, >> and >> >>>>>>>> ( >> >>>>>>>> Xmax is null || // the row has not been >> >>>>>>>> deleted, or >> >>>>>>>> ( >> >>>>>>>> Xmax == my-transaction && // it was deleted by the >> >>>>>>>> current transaction >> >>>>>>>> Cmax >= my-command // but not before this >> >>>>>>>> command, >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> || // or >> >>>>>>>> ( >> >>>>>>>> Xmin is committed && // the row was inserted >> by >> >>>>>>>> a committed transaction, and >> >>>>>>>> ( >> >>>>>>>> Xmax is null || // the row has not been >> >>>>>>>> deleted, or >> >>>>>>>> ( >> >>>>>>>> Xmax == my-transaction && // the row is being >> deleted >> >>>>>>>> by this transaction >> >>>>>>>> Cmax >= my-command) || // but it's not deleted >> >>>>>>>> "yet", or >> >>>>>>>> ( >> >>>>>>>> Xmax != my-transaction && // the row was deleted >> by >> >>>>>>>> another transaction >> >>>>>>>> Xmax is not committed // that has not been >> >>>>>>>> committed >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> >> >>>>>>>> Because cmin and cmax are internally a single system column, >> >>>>>>>> it is therefore not possible to simply record the status of a row >> >>>>>>>> that is created and expired in the same multi-statement >> transaction. >> >>>>>>>> For that reason, a special combo command id is created that >> >>>>>>>> references >> >>>>>>>> a local memory hash that contains the actual cmin and cmax >> values. >> >>>>>>>> It means that if combo id is being used the number we are seeing >> >>>>>>>> would not be the cmin or cmax it will be an index into a local >> >>>>>>>> array that contains a structure with has the actual cmin and cmax >> >>>>>>>> values. >> >>>>>>>> >> >>>>>>>> The following queries (taken mostly from >> >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >> >>>>>>>> use the contrib module pageinspect, which allows >> >>>>>>>> visibility of internal heap page structures and all stored rows, >> >>>>>>>> including those not visible in the current snapshot. >> >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >> >>>>>>>> >> >>>>>>>> We are exploring 3 examples here: >> >>>>>>>> 1) INSERT & DELETE in a single transaction >> >>>>>>>> 2) INSERT & UPDATE in a single transaction >> >>>>>>>> 3) INSERT from two different transactions & UPDATE from one >> >>>>>>>> >> >>>>>>>> test=# drop table mvcc_demo; >> >>>>>>>> DROP TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# create table mvcc_demo (val int); >> >>>>>>>> CREATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show >> that >> >>>>>>>> combo id would be different >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80685 | 0 | 3 | f >> >>>>>>>> 80685 | 0 | 4 | f >> >>>>>>>> 80685 | 0 | 5 | f >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 3 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80685 | 80685 | 0 | t >> >>>>>>>> 80685 | 80685 | 1 | t >> >>>>>>>> 80685 | 80685 | 2 | t >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax >> >>>>>>>> they are actually >> >>>>>>>> the indexes of the internal array already explained above. >> >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 >> >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 >> >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80675 | 0 | 0 | f >> >>>>>>>> 80675 | 0 | 1 | f >> >>>>>>>> 80675 | 0 | 2 | f >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >> >>>>>>>> UPDATE 3 >> >>>>>>>> test=# >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80675 | 80675 | 0 | t >> >>>>>>>> 80675 | 80675 | 1 | t >> >>>>>>>> 80675 | 80675 | 2 | t >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> (6 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> >> >>>>>>>> -- From one psql issue >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool >> AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80677 | 0 | 0 | f >> >>>>>>>> (1 row) >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> test=# -- From another issue >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80677 | 0 | 0 | f >> >>>>>>>> 80678 | 0 | 0 | f >> >>>>>>>> 80678 | 0 | 1 | f >> >>>>>>>> 80678 | 0 | 2 | f >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >> >>>>>>>> UPDATE 4 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80678 | 80678 | 0 | t >> >>>>>>>> 80678 | 80678 | 1 | t >> >>>>>>>> 80678 | 80678 | 2 | t >> >>>>>>>> 80677 | 80678 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> (8 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> >> >>>>>>>> test=# -- Before finishing this, issue these from the first >> psql >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool >> AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80678 | 80678 | 0 | t >> >>>>>>>> 80678 | 80678 | 1 | t >> >>>>>>>> 80678 | 80678 | 2 | t >> >>>>>>>> 80677 | 80678 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> (8 rows) >> >>>>>>>> >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Now consider the case we are trying to solve >> >>>>>>>> >> >>>>>>>> drop table tt1; >> >>>>>>>> create table tt1(f1 int); >> >>>>>>>> >> >>>>>>>> BEGIN; >> >>>>>>>> insert into tt1 values(1); >> >>>>>>>> declare c50 cursor for select * from tt1; -- should show one row >> >>>>>>>> only >> >>>>>>>> insert into tt1 values(2); >> >>>>>>>> fetch all from c50; >> >>>>>>>> COMMIT; >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Consider Data node 1 log >> >>>>>>>> >> >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >> >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >> >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >> >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >> >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >> >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >> >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >> >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >> >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() >> >>>>>>>> FROM tt1] >> >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >> >>>>>>>> >> >>>>>>>> The cursor currently shows both inserted rows because command id >> at >> >>>>>>>> data node in >> >>>>>>>> step (j) is 0 >> >>>>>>>> step (k) is 1 & >> >>>>>>>> step (l) is 2 >> >>>>>>>> >> >>>>>>>> Where as we need command ids to be >> >>>>>>>> >> >>>>>>>> step (j) should be 0 >> >>>>>>>> step (k) should be 2 & >> >>>>>>>> step (l) should be 1 >> >>>>>>>> >> >>>>>>>> This will solve the cursor visibility problem. >> >>>>>>>> >> >>>>>>>> To implement this I suggest we send command IDs to data nodes >> from >> >>>>>>>> the coordinator >> >>>>>>>> like we send gxid. The only difference will be that we do not >> need >> >>>>>>>> to take command IDs >> >>>>>>>> from GTM since they are only valid with in the transaction. >> >>>>>>>> >> >>>>>>>> See this example >> >>>>>>>> >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> ------+------+------+------+---- >> >>>>>>>> (0 rows) >> >>>>>>>> >> >>>>>>>> test=# begin; >> >>>>>>>> BEGIN >> >>>>>>>> test=# insert into tt1 values(1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> (1 row) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> (2 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(4); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# end; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(5); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >> >>>>>>>> (5 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(6); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >> >>>>>>>> 80617 | 0 | 0 | 0 | 6 >> >>>>>>>> (6 rows) >> >>>>>>>> >> >>>>>>>> Note that at the end of the multi-statement transaction the >> command >> >>>>>>>> id gets reset to zero. >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> Abbas >> >>>>>>>> Architect >> >>>>>>>> EnterpriseDB Corporation >> >>>>>>>> The Enterprise PostgreSQL Company >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> -- >> >>>>>>> Abbas >> >>>>>>> Architect >> >>>>>>> EnterpriseDB Corporation >> >>>>>>> The Enterprise PostgreSQL Company >> >>>>>>> >> >>>>>>> Phone: 92-334-5100153 >> >>>>>>> >> >>>>>>> Website: www.enterprisedb.com >> >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>>>>>> >> >>>>>>> This e-mail message (and any attachment) is intended for the use >> of >> >>>>>>> the individual or entity to whom it is addressed. This message >> >>>>>>> contains information from EnterpriseDB Corporation that may be >> >>>>>>> privileged, confidential, or exempt from disclosure under >> applicable >> >>>>>>> law. If you are not the intended recipient or authorized to >> receive >> >>>>>>> this for the intended recipient, any use, dissemination, >> >>>>>>> distribution, >> >>>>>>> retention, archiving, or copying of this communication is strictly >> >>>>>>> prohibited. If you have received this e-mail in error, please >> notify >> >>>>>>> the sender immediately by reply e-mail and delete this message. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> ------------------------------------------------------------------------------ >> >>>>>>> Live Security Virtual Conference >> >>>>>>> Exclusive live event will cover all the ways today's security and >> >>>>>>> threat landscape has changed and how IT managers can respond. >> >>>>>>> Discussions >> >>>>>>> will include endpoint security, mobile security and the latest in >> >>>>>>> malware >> >>>>>>> threats. >> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >>>>>>> _______________________________________________ >> >>>>>>> Postgres-xc-developers mailing list >> >>>>>>> Pos...@li... >> >>>>>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> Michael Paquier >> >>>>>> http://michael.otacoo.com >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> -- >> >>>>> Abbas >> >>>>> Architect >> >>>>> EnterpriseDB Corporation >> >>>>> The Enterprise PostgreSQL Company >> >>>>> >> >>>>> Phone: 92-334-5100153 >> >>>>> >> >>>>> Website: www.enterprisedb.com >> >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>>>> >> >>>>> This e-mail message (and any attachment) is intended for the use of >> >>>>> the individual or entity to whom it is addressed. This message >> >>>>> contains information from EnterpriseDB Corporation that may be >> >>>>> privileged, confidential, or exempt from disclosure under applicable >> >>>>> law. If you are not the intended recipient or authorized to receive >> >>>>> this for the intended recipient, any use, dissemination, >> distribution, >> >>>>> retention, archiving, or copying of this communication is strictly >> >>>>> prohibited. If you have received this e-mail in error, please notify >> >>>>> the sender immediately by reply e-mail and delete this message. >> >>>>> >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>> Live Security Virtual Conference >> >>>>> Exclusive live event will cover all the ways today's security and >> >>>>> threat landscape has changed and how IT managers can respond. >> >>>>> Discussions >> >>>>> will include endpoint security, mobile security and the latest in >> >>>>> malware >> >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >>>>> _______________________________________________ >> >>>>> Postgres-xc-developers mailing list >> >>>>> Pos...@li... >> >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Best Wishes, >> >>>> Ashutosh Bapat >> >>>> EntepriseDB Corporation >> >>>> The Enterprise Postgres Company >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> -- >> >>> Abbas >> >>> Architect >> >>> EnterpriseDB Corporation >> >>> The Enterprise PostgreSQL Company >> >>> >> >>> Phone: 92-334-5100153 >> >>> >> >>> Website: www.enterprisedb.com >> >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>> >> >>> This e-mail message (and any attachment) is intended for the use of >> >>> the individual or entity to whom it is addressed. This message >> >>> contains information from EnterpriseDB Corporation that may be >> >>> privileged, confidential, or exempt from disclosure under applicable >> >>> law. If you are not the intended recipient or authorized to receive >> >>> this for the intended recipient, any use, dissemination, distribution, >> >>> retention, archiving, or copying of this communication is strictly >> >>> prohibited. If you have received this e-mail in error, please notify >> >>> the sender immediately by reply e-mail and delete this message. >> >> >> >> >> >> >> >> >> >> -- >> >> -- >> >> Abbas >> >> Architect >> >> EnterpriseDB Corporation >> >> The Enterprise PostgreSQL Company >> >> >> >> Phone: 92-334-5100153 >> >> >> >> Website: www.enterprisedb.com >> >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> >> >> This e-mail message (and any attachment) is intended for the use of >> >> the individual or entity to whom it is addressed. This message >> >> contains information from EnterpriseDB Corporation that may be >> >> privileged, confidential, or exempt from disclosure under applicable >> >> law. If you are not the intended recipient or authorized to receive >> >> this for the intended recipient, any use, dissemination, distribution, >> >> retention, archiving, or copying of this communication is strictly >> >> prohibited. If you have received this e-mail in error, please notify >> >> the sender immediately by reply e-mail and delete this message. >> > >> > >> > >> > >> > -- >> > -- >> > Abbas >> > Architect >> > EnterpriseDB Corporation >> > The Enterprise PostgreSQL Company >> > >> > Phone: 92-334-5100153 >> > >> > Website: www.enterprisedb.com >> > EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> > Follow us on Twitter: http://www.twitter.com/enterprisedb >> > >> > This e-mail message (and any attachment) is intended for the use of >> > the individual or entity to whom it is addressed. This message >> > contains information from EnterpriseDB Corporation that may be >> > privileged, confidential, or exempt from disclosure under applicable >> > law. If you are not the intended recipient or authorized to receive >> > this for the intended recipient, any use, dissemination, distribution, >> > retention, archiving, or copying of this communication is strictly >> > prohibited. If you have received this e-mail in error, please notify >> > the sender immediately by reply e-mail and delete this message. >> > >> > >> ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> Discussions >> > will include endpoint security, mobile security and the latest in >> malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-19 00:01:33
|
Great! I will try to grab some time to look into that, perhaps not today as I have several meetings. On Thu, Jul 19, 2012 at 7:56 AM, Abbas Butt <abb...@en...>wrote: > Hi, > Attached please find the patch that solves cursor visibility problem as > well as the inserting into child by selecting from the parent problem using > the design discussed earlier. > Test cases are added and regressions are fixed accordingly. > Patch is ready for review. > > Thanks. > > > On Tue, Jul 10, 2012 at 6:54 AM, Koichi Suzuki <koi...@gm...>wrote: > >> I basically agree on the idea. I also think Michael needs this kind >> of extension to feedback trigger target rows. We should design >> NoticeResponse extensible enough for various usage. >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/7/10 Abbas Butt <abb...@en...>: >> > Here is the update on the issue. >> > >> > It was decided that the changes done by the data nodes in the command id >> > should be communicated back to the coordinator and that the coordinator >> > should choose the largest of all the received values as the next >> command id. >> > >> > It was suggested that we should check that a skipped value of command id >> > should not create a problem for the next operations on the table. I have >> > verified both by studying the code and by actually changing the function >> > CommandCounterIncrement to increment the command id by 3 and running >> > regression. It worked fine so a hole in command id is not a problem. >> > >> > Next it was suggested that we should use the mechanism currently in >> place to >> > send # of tuples affected by a statement to communicate the changed >> command >> > id to the coordinator. >> > Please refer to this link in the documentation >> > http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html >> > Note that there is no such message format that exists in the current >> over >> > the wire protocol to communicate the # of tuples affected by a >> statement. >> > The libpq functions that we might suspect do the same are PQntuples and >> > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as >> > PQcmdTuples extracts the # of tuples affected from the CommandComplete >> 'C' >> > message string. We cannot use these mechanisms for our purpose. >> > >> > I evaluated the use of NoticeResponse 'N' for sending changed command >> id but >> > the message format of NoticeResponse mandates the use of certain fields >> > which will make our sent messages un-necessarily bulky and would consume >> > network bandwidth for no reason. >> > >> > I therefore suggest the we use a new message for communicating XC >> specific >> > information from data node to coordinator. Currently we will use it for >> > command id but we will design the message format flexible enough to >> > accommodate future XC requirements. Whenever the data node increments >> > command id we will send the information to the coordinator and >> > handle_response function in execRemote.c would be changed to >> accommodate new >> > message. Since coordinators will never use the new message therefore the >> > existing clients do not need to bother. >> > >> > Comments or suggestions are welcome. >> > >> > Regards >> > >> > >> > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en... >> > >> > wrote: >> >> >> >> While fixing the regression failures resulting from the changes done by >> >> the patch I was able to fix all except this test case >> >> >> >> set enforce_two_phase_commit = off; >> >> >> >> CREATE TEMP TABLE users ( >> >> id INT PRIMARY KEY, >> >> name VARCHAR NOT NULL >> >> ) DISTRIBUTE BY REPLICATION; >> >> >> >> INSERT INTO users VALUES (1, 'Jozko'); >> >> INSERT INTO users VALUES (2, 'Ferko'); >> >> INSERT INTO users VALUES (3, 'Samko'); >> >> CREATE TEMP TABLE tasks ( >> >> id INT PRIMARY KEY, >> >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL >> >> ) DISTRIBUTE BY REPLICATION; >> >> >> >> INSERT INTO tasks VALUES (1,1,NULL,NULL); >> >> INSERT INTO tasks VALUES (2,2,2,NULL); >> >> INSERT INTO tasks VALUES (3,3,3,3); >> >> >> >> BEGIN; >> >> >> >> UPDATE tasks set id=id WHERE id=2; >> >> SELECT * FROM tasks; >> >> >> >> DELETE FROM users WHERE id = 2; >> >> SELECT * FROM tasks; >> >> >> >> COMMIT; >> >> >> >> The obtained output from the last select statement is >> >> >> >> id | owner | worker | checked_by >> >> ----+-------+--------+------------ >> >> 1 | 1 | | >> >> 3 | 3 | 3 | 3 >> >> 2 | 2 | 2 | >> >> (3 rows) >> >> >> >> where as the expected output is >> >> >> >> id | owner | worker | checked_by >> >> ----+-------+--------+------------ >> >> 1 | 1 | | >> >> 3 | 3 | 3 | 3 >> >> 2 | | | >> >> (3 rows) >> >> >> >> Note that the owner and worker have been set to null due to "ON DELETE >> SET >> >> NULL". >> >> >> >> Here is the reason why this does not work properly. Consider the last >> >> transaction >> >> >> >> BEGIN; >> >> >> >> UPDATE tasks set id=id WHERE id=2; >> >> SELECT * FROM tasks; >> >> >> >> DELETE FROM users WHERE id = 2; >> >> SELECT * FROM tasks; >> >> >> >> COMMIT; >> >> >> >> Here are the command id values the coordinator sends to the data node >> >> >> >> 0 for the first update that gets incremented to 1 because this is a DML >> >> and needs to consume a command id >> >> 1 for the first select that remains 1 since it is not required to be >> >> consumed. >> >> 1 for the delete statement that gets incremented to 2 because it is a >> DML >> >> and 2 for the last select. >> >> >> >> Now this is what happens on the data node >> >> >> >> When the data node receives the first update with command id 0, it >> >> increments it once due to the update itself and once due to the update >> run >> >> because of "ON UPDATE CASCADE". Hence the command id at the end of >> update on >> >> data node is 2. >> >> The first select comes to data node with command id 1, which is >> incorrect. >> >> The user's intention is to see data after update and its command id >> should >> >> be 2. >> >> Now delete comes with command id 1, and data node increments it once >> due >> >> to the delete itself and once due to the update run because of "ON >> DELETE >> >> SET NULL", hence the command id at the end of delete is 3. >> >> Coordinator now sends last select with command id 2, which is again >> >> incorrect since user's intention is to see data after delete and select >> >> should have been sent to data node with command id 3 or 4. >> >> >> >> Every time data node increments command id due to any statements run >> >> implicitly either because of the constraints or triggers, this scheme >> of >> >> sending command ids to data node from coordinator to solve fetch >> problems >> >> would fail. >> >> >> >> Datanode can have a trigger e.g. inserting rows thrice on every single >> >> insert and would increment command id on every insert. Therefore this >> design >> >> cannot work. >> >> >> >> Either we have to synchronize command ids between datanode and >> coordinator >> >> through GTM >> >> OR >> >> We will have to send the DECLARE CURSOR down to the datanode. In this >> case >> >> however we will not be able to send the cursor query as it is because >> the >> >> query might contain a join on two tables which exist on a disjoint set >> of >> >> data nodes. >> >> >> >> Comments or suggestions are welcome. >> >> >> >> >> >> >> >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt < >> abb...@en...> >> >> wrote: >> >>> >> >>> Thanks for your comments. >> >>> >> >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat >> >>> <ash...@en...> wrote: >> >>>> >> >>>> Hi Abbas, >> >>>> I have few comments to make >> >>>> 1. With this patch there are two variables for having command Id, >> that >> >>>> is going to cause confusion and will be a maintenance burden, might >> be error >> >>>> prone. Is it possible to use a single variable instead of two? >> >>> >> >>> >> >>> Are you talking about receivedCommandId and currentCommandId? If yes, >> I >> >>> would prefer not having a packet received from coordinator overwrite >> the >> >>> currentCommandId at data node, because I am not 100% sure about the >> life >> >>> time of currentCommandId, I might overwrite it before time. It would >> be safe >> >>> to let currentCommandId as is unless we are compelled to get the next >> >>> command ID, and have the received command id take priority at that >> time. >> >>> >> >>>> >> >>>> Right now there is some code which is specific to cursors in your >> patch. >> >>>> If you can plug the coordinator command id somehow into >> currentCommandId, >> >>>> you won't need that code and any other code which needs coordinator >> command >> >>>> ID will be automatically taken care of. >> >>> >> >>> >> >>> That code is required to solve a problem. Consider this case when a >> >>> coordinator received this transaction >> >>> >> >>> >> >>> BEGIN; >> >>> insert into tt1 values(1); >> >>> declare c50 cursor for select * from tt1; >> >>> insert into tt1 values(2); >> >>> fetch all from c50; >> >>> COMMIT; >> >>> >> >>> While sending select to the data node in response to a fetch we need >> to >> >>> know what was the command ID of the declare cursor statement and we >> need to >> >>> send that command ID to the data node for this particular fetch. This >> is the >> >>> main idea behind this solution. >> >>> >> >>> The first insert goes to the data node with command id 0, the second >> >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When >> >>> coordinator sees fetch it needs to send select to the data node with >> command >> >>> ID 1 rather than 3. >> >>> >> >>> >> >>>> >> >>>> 2. A non-transaction on coordinator can spawn tranasactions on >> datanode >> >>>> or subtransactions (if there is already a transaction running). Does >> your >> >>>> patch handle that case? >> >>> >> >>> >> >>> No and it does not need to, because that case has no known problems >> that >> >>> we need to solve. I don't think my patch would impact any such case >> but I >> >>> will analyze any failures that I may get in regressions. >> >>> >> >>>> >> >>>> Should we do more thorough research in the transaction management, >> esp. >> >>>> to see the impact of getting same command id for two commands on the >> >>>> datanode? >> >>> >> >>> >> >>> If we issue two commands with the same command ID then we will >> definitely >> >>> have visibility issues according to the rules I have already >> explained. But >> >>> we will not have two commands sent to the data node with same command >> id. >> >>> >> >>>> >> >>>> >> >>>> >> >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt >> >>>> <abb...@en...> wrote: >> >>>>> >> >>>>> Hi Ashutosh, >> >>>>> Here are the results with the val column, Thanks. >> >>>>> >> >>>>> test=# drop table mvcc_demo; >> >>>>> DROP TABLE >> >>>>> test=# >> >>>>> test=# create table mvcc_demo (val int); >> >>>>> CREATE TABLE >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> test=# >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >> >>>>> combo id would be different >> >>>>> DELETE 0 >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 0 >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 0 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80689 | 0 | 3 | f >> >>>>> 80689 | 0 | 4 | f >> >>>>> 80689 | 0 | 5 | f >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80689 | 0 | 3 | 3 | 1 >> >>>>> 80689 | 0 | 4 | 4 | 2 >> >>>>> 80689 | 0 | 5 | 5 | 3 >> >>>>> >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 3 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80689 | 80689 | 0 | t >> >>>>> 80689 | 80689 | 1 | t >> >>>>> 80689 | 80689 | 2 | t >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> ------+------+------+------+----- >> >>>>> (0 rows) >> >>>>> >> >>>>> >> >>>>> test=# >> >>>>> test=# END; >> >>>>> COMMIT >> >>>>> test=# >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80693 | 0 | 0 | f >> >>>>> 80693 | 0 | 1 | f >> >>>>> 80693 | 0 | 2 | f >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80693 | 0 | 0 | 0 | 1 >> >>>>> 80693 | 0 | 1 | 1 | 2 >> >>>>> 80693 | 0 | 2 | 2 | 3 >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# UPDATE mvcc_demo SET val = 10; >> >>>>> >> >>>>> UPDATE 3 >> >>>>> test=# >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80693 | 80693 | 0 | t >> >>>>> 80693 | 80693 | 1 | t >> >>>>> 80693 | 80693 | 2 | t >> >>>>> 80693 | 0 | 3 | f >> >>>>> 80693 | 0 | 3 | f >> >>>>> 80693 | 0 | 3 | f >> >>>>> (6 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> (3 rows) >> >>>>> >> >>>>> >> >>>>> test=# >> >>>>> test=# END; >> >>>>> COMMIT >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- From one psql issue >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80699 | 0 | 0 | f >> >>>>> (1 row) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80699 | 0 | 0 | 0 | 1 >> >>>>> (1 row) >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# -- From another issue >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80699 | 0 | 0 | f >> >>>>> 80700 | 0 | 0 | f >> >>>>> 80700 | 0 | 1 | f >> >>>>> 80700 | 0 | 2 | f >> >>>>> (4 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80699 | 0 | 0 | 0 | 1 >> >>>>> 80700 | 0 | 0 | 0 | 2 >> >>>>> 80700 | 0 | 1 | 1 | 3 >> >>>>> 80700 | 0 | 2 | 2 | 4 >> >>>>> (4 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# UPDATE mvcc_demo SET val = 10; >> >>>>> >> >>>>> UPDATE 4 >> >>>>> test=# >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80700 | 80700 | 0 | t >> >>>>> 80700 | 80700 | 1 | t >> >>>>> 80700 | 80700 | 2 | t >> >>>>> 80699 | 80700 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> (8 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> (4 rows) >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# -- Before finishing this, issue these from the first psql >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80700 | 80700 | 0 | t >> >>>>> 80700 | 80700 | 1 | t >> >>>>> 80700 | 80700 | 2 | t >> >>>>> 80699 | 80700 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> (8 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+-------+------+------+----- >> >>>>> 80699 | 80700 | 3 | 3 | 1 >> >>>>> (1 row) >> >>>>> >> >>>>> test=# end; >> >>>>> COMMIT >> >>>>> >> >>>>> >> >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier >> >>>>> <mic...@gm...> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, >> so >> >>>>>> be sure to send it to remote Datanodes really only if necessary. >> >>>>>> You should put more severe conditions blocking this function cid >> can >> >>>>>> easily get incremented in Postgres. >> >>>>>> >> >>>>>> Regards, >> >>>>>> >> >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt >> >>>>>> <abb...@en...> wrote: >> >>>>>>> >> >>>>>>> PFA a WIP patch implementing the design presented earlier. >> >>>>>>> The patch is WIP because it still has and FIXME and it shows some >> >>>>>>> regression failures that need to be fixed, but other than that it >> confirms >> >>>>>>> that the suggested design would work fine. The following test >> cases now work >> >>>>>>> fine >> >>>>>>> >> >>>>>>> drop table tt1; >> >>>>>>> create table tt1(f1 int) distribute by replication; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> >> >>>>>>> insert into tt1 values(2); >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> insert into tt1 values(2); >> >>>>>>> >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(3); >> >>>>>>> >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> declare c51 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(3); >> >>>>>>> fetch all from c50; >> >>>>>>> fetch all from c51; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> declare c51 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> insert into tt1 values(3); >> >>>>>>> fetch all from c50; >> >>>>>>> fetch all from c51; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt >> >>>>>>> <abb...@en...> wrote: >> >>>>>>>> >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> In a multi-statement transaction each statement is given a >> command >> >>>>>>>> identifier >> >>>>>>>> starting from zero and incrementing for each statement. >> >>>>>>>> These command indentifers are required for extra tracking because >> >>>>>>>> each >> >>>>>>>> statement has its own visibility rules with in the transaction. >> >>>>>>>> For example, a cursor’s contents must remain unchanged even if >> later >> >>>>>>>> statements in the >> >>>>>>>> same transaction modify rows. Such tracking is implemented using >> >>>>>>>> system command id >> >>>>>>>> columns cmin/cmax, which is internally actually is a single >> column. >> >>>>>>>> >> >>>>>>>> cmin/cmax come into play in case of multi-statement transactions >> >>>>>>>> only, >> >>>>>>>> they are both zero otherwise. >> >>>>>>>> >> >>>>>>>> cmin "The command identifier of the statement within the >> inserting >> >>>>>>>> transaction." >> >>>>>>>> cmax "The command identifier of the statement within the deleting >> >>>>>>>> transaction." >> >>>>>>>> >> >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) >> >>>>>>>> >> >>>>>>>> ( // A heap tuple is valid >> >>>>>>>> "now" iff >> >>>>>>>> Xmin == my-transaction && // inserted by the >> current >> >>>>>>>> transaction >> >>>>>>>> Cmin < my-command && // before this command, >> and >> >>>>>>>> ( >> >>>>>>>> Xmax is null || // the row has not been >> >>>>>>>> deleted, or >> >>>>>>>> ( >> >>>>>>>> Xmax == my-transaction && // it was deleted by the >> >>>>>>>> current transaction >> >>>>>>>> Cmax >= my-command // but not before this >> >>>>>>>> command, >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> || // or >> >>>>>>>> ( >> >>>>>>>> Xmin is committed && // the row was inserted >> by >> >>>>>>>> a committed transaction, and >> >>>>>>>> ( >> >>>>>>>> Xmax is null || // the row has not been >> >>>>>>>> deleted, or >> >>>>>>>> ( >> >>>>>>>> Xmax == my-transaction && // the row is being >> deleted >> >>>>>>>> by this transaction >> >>>>>>>> Cmax >= my-command) || // but it's not deleted >> >>>>>>>> "yet", or >> >>>>>>>> ( >> >>>>>>>> Xmax != my-transaction && // the row was deleted >> by >> >>>>>>>> another transaction >> >>>>>>>> Xmax is not committed // that has not been >> >>>>>>>> committed >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> >> >>>>>>>> Because cmin and cmax are internally a single system column, >> >>>>>>>> it is therefore not possible to simply record the status of a row >> >>>>>>>> that is created and expired in the same multi-statement >> transaction. >> >>>>>>>> For that reason, a special combo command id is created that >> >>>>>>>> references >> >>>>>>>> a local memory hash that contains the actual cmin and cmax >> values. >> >>>>>>>> It means that if combo id is being used the number we are seeing >> >>>>>>>> would not be the cmin or cmax it will be an index into a local >> >>>>>>>> array that contains a structure with has the actual cmin and cmax >> >>>>>>>> values. >> >>>>>>>> >> >>>>>>>> The following queries (taken mostly from >> >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >> >>>>>>>> use the contrib module pageinspect, which allows >> >>>>>>>> visibility of internal heap page structures and all stored rows, >> >>>>>>>> including those not visible in the current snapshot. >> >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >> >>>>>>>> >> >>>>>>>> We are exploring 3 examples here: >> >>>>>>>> 1) INSERT & DELETE in a single transaction >> >>>>>>>> 2) INSERT & UPDATE in a single transaction >> >>>>>>>> 3) INSERT from two different transactions & UPDATE from one >> >>>>>>>> >> >>>>>>>> test=# drop table mvcc_demo; >> >>>>>>>> DROP TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# create table mvcc_demo (val int); >> >>>>>>>> CREATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show >> that >> >>>>>>>> combo id would be different >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80685 | 0 | 3 | f >> >>>>>>>> 80685 | 0 | 4 | f >> >>>>>>>> 80685 | 0 | 5 | f >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 3 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80685 | 80685 | 0 | t >> >>>>>>>> 80685 | 80685 | 1 | t >> >>>>>>>> 80685 | 80685 | 2 | t >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax >> >>>>>>>> they are actually >> >>>>>>>> the indexes of the internal array already explained above. >> >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 >> >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 >> >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80675 | 0 | 0 | f >> >>>>>>>> 80675 | 0 | 1 | f >> >>>>>>>> 80675 | 0 | 2 | f >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >> >>>>>>>> UPDATE 3 >> >>>>>>>> test=# >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80675 | 80675 | 0 | t >> >>>>>>>> 80675 | 80675 | 1 | t >> >>>>>>>> 80675 | 80675 | 2 | t >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> (6 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> >> >>>>>>>> -- From one psql issue >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool >> AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80677 | 0 | 0 | f >> >>>>>>>> (1 row) >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> test=# -- From another issue >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80677 | 0 | 0 | f >> >>>>>>>> 80678 | 0 | 0 | f >> >>>>>>>> 80678 | 0 | 1 | f >> >>>>>>>> 80678 | 0 | 2 | f >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >> >>>>>>>> UPDATE 4 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80678 | 80678 | 0 | t >> >>>>>>>> 80678 | 80678 | 1 | t >> >>>>>>>> 80678 | 80678 | 2 | t >> >>>>>>>> 80677 | 80678 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> (8 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> >> >>>>>>>> test=# -- Before finishing this, issue these from the first >> psql >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool >> AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80678 | 80678 | 0 | t >> >>>>>>>> 80678 | 80678 | 1 | t >> >>>>>>>> 80678 | 80678 | 2 | t >> >>>>>>>> 80677 | 80678 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> (8 rows) >> >>>>>>>> >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Now consider the case we are trying to solve >> >>>>>>>> >> >>>>>>>> drop table tt1; >> >>>>>>>> create table tt1(f1 int); >> >>>>>>>> >> >>>>>>>> BEGIN; >> >>>>>>>> insert into tt1 values(1); >> >>>>>>>> declare c50 cursor for select * from tt1; -- should show one row >> >>>>>>>> only >> >>>>>>>> insert into tt1 values(2); >> >>>>>>>> fetch all from c50; >> >>>>>>>> COMMIT; >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Consider Data node 1 log >> >>>>>>>> >> >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >> >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >> >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >> >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >> >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >> >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >> >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >> >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >> >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() >> >>>>>>>> FROM tt1] >> >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >> >>>>>>>> >> >>>>>>>> The cursor currently shows both inserted rows because command id >> at >> >>>>>>>> data node in >> >>>>>>>> step (j) is 0 >> >>>>>>>> step (k) is 1 & >> >>>>>>>> step (l) is 2 >> >>>>>>>> >> >>>>>>>> Where as we need command ids to be >> >>>>>>>> >> >>>>>>>> step (j) should be 0 >> >>>>>>>> step (k) should be 2 & >> >>>>>>>> step (l) should be 1 >> >>>>>>>> >> >>>>>>>> This will solve the cursor visibility problem. >> >>>>>>>> >> >>>>>>>> To implement this I suggest we send command IDs to data nodes >> from >> >>>>>>>> the coordinator >> >>>>>>>> like we send gxid. The only difference will be that we do not >> need >> >>>>>>>> to take command IDs >> >>>>>>>> from GTM since they are only valid with in the transaction. >> >>>>>>>> >> >>>>>>>> See this example >> >>>>>>>> >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> ------+------+------+------+---- >> >>>>>>>> (0 rows) >> >>>>>>>> >> >>>>>>>> test=# begin; >> >>>>>>>> BEGIN >> >>>>>>>> test=# insert into tt1 values(1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> (1 row) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> (2 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(4); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# end; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(5); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >> >>>>>>>> (5 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(6); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >> >>>>>>>> 80617 | 0 | 0 | 0 | 6 >> >>>>>>>> (6 rows) >> >>>>>>>> >> >>>>>>>> Note that at the end of the multi-statement transaction the >> command >> >>>>>>>> id gets reset to zero. >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> Abbas >> >>>>>>>> Architect >> >>>>>>>> EnterpriseDB Corporation >> >>>>>>>> The Enterprise PostgreSQL Company >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> -- >> >>>>>>> Abbas >> >>>>>>> Architect >> >>>>>>> EnterpriseDB Corporation >> >>>>>>> The Enterprise PostgreSQL Company >> >>>>>>> >> >>>>>>> Phone: 92-334-5100153 >> >>>>>>> >> >>>>>>> Website: www.enterprisedb.com >> >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>>>>>> >> >>>>>>> This e-mail message (and any attachment) is intended for the use >> of >> >>>>>>> the individual or entity to whom it is addressed. This message >> >>>>>>> contains information from EnterpriseDB Corporation that may be >> >>>>>>> privileged, confidential, or exempt from disclosure under >> applicable >> >>>>>>> law. If you are not the intended recipient or authorized to >> receive >> >>>>>>> this for the intended recipient, any use, dissemination, >> >>>>>>> distribution, >> >>>>>>> retention, archiving, or copying of this communication is strictly >> >>>>>>> prohibited. If you have received this e-mail in error, please >> notify >> >>>>>>> the sender immediately by reply e-mail and delete this message. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> ------------------------------------------------------------------------------ >> >>>>>>> Live Security Virtual Conference >> >>>>>>> Exclusive live event will cover all the ways today's security and >> >>>>>>> threat landscape has changed and how IT managers can respond. >> >>>>>>> Discussions >> >>>>>>> will include endpoint security, mobile security and the latest in >> >>>>>>> malware >> >>>>>>> threats. >> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >>>>>>> _______________________________________________ >> >>>>>>> Postgres-xc-developers mailing list >> >>>>>>> Pos...@li... >> >>>>>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> Michael Paquier >> >>>>>> http://michael.otacoo.com >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> -- >> >>>>> Abbas >> >>>>> Architect >> >>>>> EnterpriseDB Corporation >> >>>>> The Enterprise PostgreSQL Company >> >>>>> >> >>>>> Phone: 92-334-5100153 >> >>>>> >> >>>>> Website: www.enterprisedb.com >> >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>>>> >> >>>>> This e-mail message (and any attachment) is intended for the use of >> >>>>> the individual or entity to whom it is addressed. This message >> >>>>> contains information from EnterpriseDB Corporation that may be >> >>>>> privileged, confidential, or exempt from disclosure under applicable >> >>>>> law. If you are not the intended recipient or authorized to receive >> >>>>> this for the intended recipient, any use, dissemination, >> distribution, >> >>>>> retention, archiving, or copying of this communication is strictly >> >>>>> prohibited. If you have received this e-mail in error, please notify >> >>>>> the sender immediately by reply e-mail and delete this message. >> >>>>> >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>> Live Security Virtual Conference >> >>>>> Exclusive live event will cover all the ways today's security and >> >>>>> threat landscape has changed and how IT managers can respond. >> >>>>> Discussions >> >>>>> will include endpoint security, mobile security and the latest in >> >>>>> malware >> >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >>>>> _______________________________________________ >> >>>>> Postgres-xc-developers mailing list >> >>>>> Pos...@li... >> >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Best Wishes, >> >>>> Ashutosh Bapat >> >>>> EntepriseDB Corporation >> >>>> The Enterprise Postgres Company >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> -- >> >>> Abbas >> >>> Architect >> >>> EnterpriseDB Corporation >> >>> The Enterprise PostgreSQL Company >> >>> >> >>> Phone: 92-334-5100153 >> >>> >> >>> Website: www.enterprisedb.com >> >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>> >> >>> This e-mail message (and any attachment) is intended for the use of >> >>> the individual or entity to whom it is addressed. This message >> >>> contains information from EnterpriseDB Corporation that may be >> >>> privileged, confidential, or exempt from disclosure under applicable >> >>> law. If you are not the intended recipient or authorized to receive >> >>> this for the intended recipient, any use, dissemination, distribution, >> >>> retention, archiving, or copying of this communication is strictly >> >>> prohibited. If you have received this e-mail in error, please notify >> >>> the sender immediately by reply e-mail and delete this message. >> >> >> >> >> >> >> >> >> >> -- >> >> -- >> >> Abbas >> >> Architect >> >> EnterpriseDB Corporation >> >> The Enterprise PostgreSQL Company >> >> >> >> Phone: 92-334-5100153 >> >> >> >> Website: www.enterprisedb.com >> >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> >> >> This e-mail message (and any attachment) is intended for the use of >> >> the individual or entity to whom it is addressed. This message >> >> contains information from EnterpriseDB Corporation that may be >> >> privileged, confidential, or exempt from disclosure under applicable >> >> law. If you are not the intended recipient or authorized to receive >> >> this for the intended recipient, any use, dissemination, distribution, >> >> retention, archiving, or copying of this communication is strictly >> >> prohibited. If you have received this e-mail in error, please notify >> >> the sender immediately by reply e-mail and delete this message. >> > >> > >> > >> > >> > -- >> > -- >> > Abbas >> > Architect >> > EnterpriseDB Corporation >> > The Enterprise PostgreSQL Company >> > >> > Phone: 92-334-5100153 >> > >> > Website: www.enterprisedb.com >> > EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> > Follow us on Twitter: http://www.twitter.com/enterprisedb >> > >> > This e-mail message (and any attachment) is intended for the use of >> > the individual or entity to whom it is addressed. This message >> > contains information from EnterpriseDB Corporation that may be >> > privileged, confidential, or exempt from disclosure under applicable >> > law. If you are not the intended recipient or authorized to receive >> > this for the intended recipient, any use, dissemination, distribution, >> > retention, archiving, or copying of this communication is strictly >> > prohibited. If you have received this e-mail in error, please notify >> > the sender immediately by reply e-mail and delete this message. >> > >> > >> ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> Discussions >> > will include endpoint security, mobile security and the latest in >> malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@en...> - 2012-07-19 04:07:04
|
Thanks Michael. On Thu, Jul 19, 2012 at 5:01 AM, Michael Paquier <mic...@gm...>wrote: > Great! I will try to grab some time to look into that, perhaps not today > as I have several meetings. > > > On Thu, Jul 19, 2012 at 7:56 AM, Abbas Butt <abb...@en...>wrote: > >> Hi, >> Attached please find the patch that solves cursor visibility problem as >> well as the inserting into child by selecting from the parent problem using >> the design discussed earlier. >> Test cases are added and regressions are fixed accordingly. >> Patch is ready for review. >> >> Thanks. >> >> >> On Tue, Jul 10, 2012 at 6:54 AM, Koichi Suzuki <koi...@gm... >> > wrote: >> >>> I basically agree on the idea. I also think Michael needs this kind >>> of extension to feedback trigger target rows. We should design >>> NoticeResponse extensible enough for various usage. >>> >>> Regards; >>> ---------- >>> Koichi Suzuki >>> >>> >>> 2012/7/10 Abbas Butt <abb...@en...>: >>> > Here is the update on the issue. >>> > >>> > It was decided that the changes done by the data nodes in the command >>> id >>> > should be communicated back to the coordinator and that the coordinator >>> > should choose the largest of all the received values as the next >>> command id. >>> > >>> > It was suggested that we should check that a skipped value of command >>> id >>> > should not create a problem for the next operations on the table. I >>> have >>> > verified both by studying the code and by actually changing the >>> function >>> > CommandCounterIncrement to increment the command id by 3 and running >>> > regression. It worked fine so a hole in command id is not a problem. >>> > >>> > Next it was suggested that we should use the mechanism currently in >>> place to >>> > send # of tuples affected by a statement to communicate the changed >>> command >>> > id to the coordinator. >>> > Please refer to this link in the documentation >>> > >>> http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html >>> > Note that there is no such message format that exists in the current >>> over >>> > the wire protocol to communicate the # of tuples affected by a >>> statement. >>> > The libpq functions that we might suspect do the same are PQntuples and >>> > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where >>> as >>> > PQcmdTuples extracts the # of tuples affected from the CommandComplete >>> 'C' >>> > message string. We cannot use these mechanisms for our purpose. >>> > >>> > I evaluated the use of NoticeResponse 'N' for sending changed command >>> id but >>> > the message format of NoticeResponse mandates the use of certain fields >>> > which will make our sent messages un-necessarily bulky and would >>> consume >>> > network bandwidth for no reason. >>> > >>> > I therefore suggest the we use a new message for communicating XC >>> specific >>> > information from data node to coordinator. Currently we will use it for >>> > command id but we will design the message format flexible enough to >>> > accommodate future XC requirements. Whenever the data node increments >>> > command id we will send the information to the coordinator and >>> > handle_response function in execRemote.c would be changed to >>> accommodate new >>> > message. Since coordinators will never use the new message therefore >>> the >>> > existing clients do not need to bother. >>> > >>> > Comments or suggestions are welcome. >>> > >>> > Regards >>> > >>> > >>> > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt < >>> abb...@en...> >>> > wrote: >>> >> >>> >> While fixing the regression failures resulting from the changes done >>> by >>> >> the patch I was able to fix all except this test case >>> >> >>> >> set enforce_two_phase_commit = off; >>> >> >>> >> CREATE TEMP TABLE users ( >>> >> id INT PRIMARY KEY, >>> >> name VARCHAR NOT NULL >>> >> ) DISTRIBUTE BY REPLICATION; >>> >> >>> >> INSERT INTO users VALUES (1, 'Jozko'); >>> >> INSERT INTO users VALUES (2, 'Ferko'); >>> >> INSERT INTO users VALUES (3, 'Samko'); >>> >> CREATE TEMP TABLE tasks ( >>> >> id INT PRIMARY KEY, >>> >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >>> >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >>> >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL >>> >> ) DISTRIBUTE BY REPLICATION; >>> >> >>> >> INSERT INTO tasks VALUES (1,1,NULL,NULL); >>> >> INSERT INTO tasks VALUES (2,2,2,NULL); >>> >> INSERT INTO tasks VALUES (3,3,3,3); >>> >> >>> >> BEGIN; >>> >> >>> >> UPDATE tasks set id=id WHERE id=2; >>> >> SELECT * FROM tasks; >>> >> >>> >> DELETE FROM users WHERE id = 2; >>> >> SELECT * FROM tasks; >>> >> >>> >> COMMIT; >>> >> >>> >> The obtained output from the last select statement is >>> >> >>> >> id | owner | worker | checked_by >>> >> ----+-------+--------+------------ >>> >> 1 | 1 | | >>> >> 3 | 3 | 3 | 3 >>> >> 2 | 2 | 2 | >>> >> (3 rows) >>> >> >>> >> where as the expected output is >>> >> >>> >> id | owner | worker | checked_by >>> >> ----+-------+--------+------------ >>> >> 1 | 1 | | >>> >> 3 | 3 | 3 | 3 >>> >> 2 | | | >>> >> (3 rows) >>> >> >>> >> Note that the owner and worker have been set to null due to "ON >>> DELETE SET >>> >> NULL". >>> >> >>> >> Here is the reason why this does not work properly. Consider the last >>> >> transaction >>> >> >>> >> BEGIN; >>> >> >>> >> UPDATE tasks set id=id WHERE id=2; >>> >> SELECT * FROM tasks; >>> >> >>> >> DELETE FROM users WHERE id = 2; >>> >> SELECT * FROM tasks; >>> >> >>> >> COMMIT; >>> >> >>> >> Here are the command id values the coordinator sends to the data node >>> >> >>> >> 0 for the first update that gets incremented to 1 because this is a >>> DML >>> >> and needs to consume a command id >>> >> 1 for the first select that remains 1 since it is not required to be >>> >> consumed. >>> >> 1 for the delete statement that gets incremented to 2 because it is a >>> DML >>> >> and 2 for the last select. >>> >> >>> >> Now this is what happens on the data node >>> >> >>> >> When the data node receives the first update with command id 0, it >>> >> increments it once due to the update itself and once due to the >>> update run >>> >> because of "ON UPDATE CASCADE". Hence the command id at the end of >>> update on >>> >> data node is 2. >>> >> The first select comes to data node with command id 1, which is >>> incorrect. >>> >> The user's intention is to see data after update and its command id >>> should >>> >> be 2. >>> >> Now delete comes with command id 1, and data node increments it once >>> due >>> >> to the delete itself and once due to the update run because of "ON >>> DELETE >>> >> SET NULL", hence the command id at the end of delete is 3. >>> >> Coordinator now sends last select with command id 2, which is again >>> >> incorrect since user's intention is to see data after delete and >>> select >>> >> should have been sent to data node with command id 3 or 4. >>> >> >>> >> Every time data node increments command id due to any statements run >>> >> implicitly either because of the constraints or triggers, this scheme >>> of >>> >> sending command ids to data node from coordinator to solve fetch >>> problems >>> >> would fail. >>> >> >>> >> Datanode can have a trigger e.g. inserting rows thrice on every single >>> >> insert and would increment command id on every insert. Therefore this >>> design >>> >> cannot work. >>> >> >>> >> Either we have to synchronize command ids between datanode and >>> coordinator >>> >> through GTM >>> >> OR >>> >> We will have to send the DECLARE CURSOR down to the datanode. In this >>> case >>> >> however we will not be able to send the cursor query as it is because >>> the >>> >> query might contain a join on two tables which exist on a disjoint >>> set of >>> >> data nodes. >>> >> >>> >> Comments or suggestions are welcome. >>> >> >>> >> >>> >> >>> >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt < >>> abb...@en...> >>> >> wrote: >>> >>> >>> >>> Thanks for your comments. >>> >>> >>> >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat >>> >>> <ash...@en...> wrote: >>> >>>> >>> >>>> Hi Abbas, >>> >>>> I have few comments to make >>> >>>> 1. With this patch there are two variables for having command Id, >>> that >>> >>>> is going to cause confusion and will be a maintenance burden, might >>> be error >>> >>>> prone. Is it possible to use a single variable instead of two? >>> >>> >>> >>> >>> >>> Are you talking about receivedCommandId and currentCommandId? If >>> yes, I >>> >>> would prefer not having a packet received from coordinator overwrite >>> the >>> >>> currentCommandId at data node, because I am not 100% sure about the >>> life >>> >>> time of currentCommandId, I might overwrite it before time. It would >>> be safe >>> >>> to let currentCommandId as is unless we are compelled to get the next >>> >>> command ID, and have the received command id take priority at that >>> time. >>> >>> >>> >>>> >>> >>>> Right now there is some code which is specific to cursors in your >>> patch. >>> >>>> If you can plug the coordinator command id somehow into >>> currentCommandId, >>> >>>> you won't need that code and any other code which needs coordinator >>> command >>> >>>> ID will be automatically taken care of. >>> >>> >>> >>> >>> >>> That code is required to solve a problem. Consider this case when a >>> >>> coordinator received this transaction >>> >>> >>> >>> >>> >>> BEGIN; >>> >>> insert into tt1 values(1); >>> >>> declare c50 cursor for select * from tt1; >>> >>> insert into tt1 values(2); >>> >>> fetch all from c50; >>> >>> COMMIT; >>> >>> >>> >>> While sending select to the data node in response to a fetch we need >>> to >>> >>> know what was the command ID of the declare cursor statement and we >>> need to >>> >>> send that command ID to the data node for this particular fetch. >>> This is the >>> >>> main idea behind this solution. >>> >>> >>> >>> The first insert goes to the data node with command id 0, the second >>> >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When >>> >>> coordinator sees fetch it needs to send select to the data node with >>> command >>> >>> ID 1 rather than 3. >>> >>> >>> >>> >>> >>>> >>> >>>> 2. A non-transaction on coordinator can spawn tranasactions on >>> datanode >>> >>>> or subtransactions (if there is already a transaction running). >>> Does your >>> >>>> patch handle that case? >>> >>> >>> >>> >>> >>> No and it does not need to, because that case has no known problems >>> that >>> >>> we need to solve. I don't think my patch would impact any such case >>> but I >>> >>> will analyze any failures that I may get in regressions. >>> >>> >>> >>>> >>> >>>> Should we do more thorough research in the transaction management, >>> esp. >>> >>>> to see the impact of getting same command id for two commands on the >>> >>>> datanode? >>> >>> >>> >>> >>> >>> If we issue two commands with the same command ID then we will >>> definitely >>> >>> have visibility issues according to the rules I have already >>> explained. But >>> >>> we will not have two commands sent to the data node with same >>> command id. >>> >>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt >>> >>>> <abb...@en...> wrote: >>> >>>>> >>> >>>>> Hi Ashutosh, >>> >>>>> Here are the results with the val column, Thanks. >>> >>>>> >>> >>>>> test=# drop table mvcc_demo; >>> >>>>> DROP TABLE >>> >>>>> test=# >>> >>>>> test=# create table mvcc_demo (val int); >>> >>>>> CREATE TABLE >>> >>>>> test=# >>> >>>>> test=# TRUNCATE mvcc_demo; >>> >>>>> TRUNCATE TABLE >>> >>>>> test=# >>> >>>>> test=# BEGIN; >>> >>>>> BEGIN >>> >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>> >>>>> combo id would be different >>> >>>>> DELETE 0 >>> >>>>> test=# DELETE FROM mvcc_demo; >>> >>>>> DELETE 0 >>> >>>>> test=# DELETE FROM mvcc_demo; >>> >>>>> DELETE 0 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>> INSERT 0 1 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+------+-----------+------------- >>> >>>>> 80689 | 0 | 3 | f >>> >>>>> 80689 | 0 | 4 | f >>> >>>>> 80689 | 0 | 5 | f >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80689 | 0 | 3 | 3 | 1 >>> >>>>> 80689 | 0 | 4 | 4 | 2 >>> >>>>> 80689 | 0 | 5 | 5 | 3 >>> >>>>> >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# DELETE FROM mvcc_demo; >>> >>>>> DELETE 3 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+-------+-----------+------------- >>> >>>>> 80689 | 80689 | 0 | t >>> >>>>> 80689 | 80689 | 1 | t >>> >>>>> 80689 | 80689 | 2 | t >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> ------+------+------+------+----- >>> >>>>> (0 rows) >>> >>>>> >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# END; >>> >>>>> COMMIT >>> >>>>> test=# >>> >>>>> test=# >>> >>>>> test=# TRUNCATE mvcc_demo; >>> >>>>> TRUNCATE TABLE >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> test=# BEGIN; >>> >>>>> BEGIN >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>> INSERT 0 1 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+------+-----------+------------- >>> >>>>> 80693 | 0 | 0 | f >>> >>>>> 80693 | 0 | 1 | f >>> >>>>> 80693 | 0 | 2 | f >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80693 | 0 | 0 | 0 | 1 >>> >>>>> 80693 | 0 | 1 | 1 | 2 >>> >>>>> 80693 | 0 | 2 | 2 | 3 >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# UPDATE mvcc_demo SET val = 10; >>> >>>>> >>> >>>>> UPDATE 3 >>> >>>>> test=# >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+-------+-----------+------------- >>> >>>>> 80693 | 80693 | 0 | t >>> >>>>> 80693 | 80693 | 1 | t >>> >>>>> 80693 | 80693 | 2 | t >>> >>>>> 80693 | 0 | 3 | f >>> >>>>> 80693 | 0 | 3 | f >>> >>>>> 80693 | 0 | 3 | f >>> >>>>> (6 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80693 | 0 | 3 | 3 | 10 >>> >>>>> 80693 | 0 | 3 | 3 | 10 >>> >>>>> 80693 | 0 | 3 | 3 | 10 >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# END; >>> >>>>> COMMIT >>> >>>>> test=# >>> >>>>> test=# TRUNCATE mvcc_demo; >>> >>>>> TRUNCATE TABLE >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- From one psql issue >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>> INSERT 0 1 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+------+-----------+------------- >>> >>>>> 80699 | 0 | 0 | f >>> >>>>> (1 row) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by >>> val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80699 | 0 | 0 | 0 | 1 >>> >>>>> (1 row) >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> test=# -- From another issue >>> >>>>> test=# BEGIN; >>> >>>>> BEGIN >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>> >>>>> INSERT 0 1 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+------+-----------+------------- >>> >>>>> 80699 | 0 | 0 | f >>> >>>>> 80700 | 0 | 0 | f >>> >>>>> 80700 | 0 | 1 | f >>> >>>>> 80700 | 0 | 2 | f >>> >>>>> (4 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80699 | 0 | 0 | 0 | 1 >>> >>>>> 80700 | 0 | 0 | 0 | 2 >>> >>>>> 80700 | 0 | 1 | 1 | 3 >>> >>>>> 80700 | 0 | 2 | 2 | 4 >>> >>>>> (4 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# UPDATE mvcc_demo SET val = 10; >>> >>>>> >>> >>>>> UPDATE 4 >>> >>>>> test=# >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+-------+-----------+------------- >>> >>>>> 80700 | 80700 | 0 | t >>> >>>>> 80700 | 80700 | 1 | t >>> >>>>> 80700 | 80700 | 2 | t >>> >>>>> 80699 | 80700 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> (8 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80700 | 0 | 3 | 3 | 10 >>> >>>>> 80700 | 0 | 3 | 3 | 10 >>> >>>>> 80700 | 0 | 3 | 3 | 10 >>> >>>>> 80700 | 0 | 3 | 3 | 10 >>> >>>>> (4 rows) >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> test=# -- Before finishing this, issue these from the first >>> psql >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+-------+-----------+------------- >>> >>>>> 80700 | 80700 | 0 | t >>> >>>>> 80700 | 80700 | 1 | t >>> >>>>> 80700 | 80700 | 2 | t >>> >>>>> 80699 | 80700 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> (8 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by >>> val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+-------+------+------+----- >>> >>>>> 80699 | 80700 | 3 | 3 | 1 >>> >>>>> (1 row) >>> >>>>> >>> >>>>> test=# end; >>> >>>>> COMMIT >>> >>>>> >>> >>>>> >>> >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier >>> >>>>> <mic...@gm...> wrote: >>> >>>>>> >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> I expect pgxc_node_send_cmd_id to have some impact on >>> performance, so >>> >>>>>> be sure to send it to remote Datanodes really only if necessary. >>> >>>>>> You should put more severe conditions blocking this function cid >>> can >>> >>>>>> easily get incremented in Postgres. >>> >>>>>> >>> >>>>>> Regards, >>> >>>>>> >>> >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt >>> >>>>>> <abb...@en...> wrote: >>> >>>>>>> >>> >>>>>>> PFA a WIP patch implementing the design presented earlier. >>> >>>>>>> The patch is WIP because it still has and FIXME and it shows some >>> >>>>>>> regression failures that need to be fixed, but other than that >>> it confirms >>> >>>>>>> that the suggested design would work fine. The following test >>> cases now work >>> >>>>>>> fine >>> >>>>>>> >>> >>>>>>> drop table tt1; >>> >>>>>>> create table tt1(f1 int) distribute by replication; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> fetch all from c50; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> fetch all from c50; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(3); >>> >>>>>>> >>> >>>>>>> fetch all from c50; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> declare c51 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(3); >>> >>>>>>> fetch all from c50; >>> >>>>>>> fetch all from c51; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> declare c51 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> insert into tt1 values(3); >>> >>>>>>> fetch all from c50; >>> >>>>>>> fetch all from c51; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt >>> >>>>>>> <abb...@en...> wrote: >>> >>>>>>>> >>> >>>>>>>> Hi, >>> >>>>>>>> >>> >>>>>>>> In a multi-statement transaction each statement is given a >>> command >>> >>>>>>>> identifier >>> >>>>>>>> starting from zero and incrementing for each statement. >>> >>>>>>>> These command indentifers are required for extra tracking >>> because >>> >>>>>>>> each >>> >>>>>>>> statement has its own visibility rules with in the transaction. >>> >>>>>>>> For example, a cursor’s contents must remain unchanged even if >>> later >>> >>>>>>>> statements in the >>> >>>>>>>> same transaction modify rows. Such tracking is implemented >>> using >>> >>>>>>>> system command id >>> >>>>>>>> columns cmin/cmax, which is internally actually is a single >>> column. >>> >>>>>>>> >>> >>>>>>>> cmin/cmax come into play in case of multi-statement transactions >>> >>>>>>>> only, >>> >>>>>>>> they are both zero otherwise. >>> >>>>>>>> >>> >>>>>>>> cmin "The command identifier of the statement within the >>> inserting >>> >>>>>>>> transaction." >>> >>>>>>>> cmax "The command identifier of the statement within the >>> deleting >>> >>>>>>>> transaction." >>> >>>>>>>> >>> >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) >>> >>>>>>>> >>> >>>>>>>> ( // A heap tuple is >>> valid >>> >>>>>>>> "now" iff >>> >>>>>>>> Xmin == my-transaction && // inserted by the >>> current >>> >>>>>>>> transaction >>> >>>>>>>> Cmin < my-command && // before this >>> command, and >>> >>>>>>>> ( >>> >>>>>>>> Xmax is null || // the row has not been >>> >>>>>>>> deleted, or >>> >>>>>>>> ( >>> >>>>>>>> Xmax == my-transaction && // it was deleted by >>> the >>> >>>>>>>> current transaction >>> >>>>>>>> Cmax >= my-command // but not before this >>> >>>>>>>> command, >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> || // or >>> >>>>>>>> ( >>> >>>>>>>> Xmin is committed && // the row was >>> inserted by >>> >>>>>>>> a committed transaction, and >>> >>>>>>>> ( >>> >>>>>>>> Xmax is null || // the row has not been >>> >>>>>>>> deleted, or >>> >>>>>>>> ( >>> >>>>>>>> Xmax == my-transaction && // the row is being >>> deleted >>> >>>>>>>> by this transaction >>> >>>>>>>> Cmax >= my-command) || // but it's not deleted >>> >>>>>>>> "yet", or >>> >>>>>>>> ( >>> >>>>>>>> Xmax != my-transaction && // the row was deleted >>> by >>> >>>>>>>> another transaction >>> >>>>>>>> Xmax is not committed // that has not been >>> >>>>>>>> committed >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> >>> >>>>>>>> Because cmin and cmax are internally a single system column, >>> >>>>>>>> it is therefore not possible to simply record the status of a >>> row >>> >>>>>>>> that is created and expired in the same multi-statement >>> transaction. >>> >>>>>>>> For that reason, a special combo command id is created that >>> >>>>>>>> references >>> >>>>>>>> a local memory hash that contains the actual cmin and cmax >>> values. >>> >>>>>>>> It means that if combo id is being used the number we are seeing >>> >>>>>>>> would not be the cmin or cmax it will be an index into a local >>> >>>>>>>> array that contains a structure with has the actual cmin and >>> cmax >>> >>>>>>>> values. >>> >>>>>>>> >>> >>>>>>>> The following queries (taken mostly from >>> >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>> >>>>>>>> use the contrib module pageinspect, which allows >>> >>>>>>>> visibility of internal heap page structures and all stored rows, >>> >>>>>>>> including those not visible in the current snapshot. >>> >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>> >>>>>>>> >>> >>>>>>>> We are exploring 3 examples here: >>> >>>>>>>> 1) INSERT & DELETE in a single transaction >>> >>>>>>>> 2) INSERT & UPDATE in a single transaction >>> >>>>>>>> 3) INSERT from two different transactions & UPDATE from one >>> >>>>>>>> >>> >>>>>>>> test=# drop table mvcc_demo; >>> >>>>>>>> DROP TABLE >>> >>>>>>>> test=# >>> >>>>>>>> test=# create table mvcc_demo (val int); >>> >>>>>>>> CREATE TABLE >>> >>>>>>>> test=# >>> >>>>>>>> test=# TRUNCATE mvcc_demo; >>> >>>>>>>> TRUNCATE TABLE >>> >>>>>>>> test=# >>> >>>>>>>> test=# BEGIN; >>> >>>>>>>> BEGIN >>> >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show >>> that >>> >>>>>>>> combo id would be different >>> >>>>>>>> DELETE 0 >>> >>>>>>>> test=# DELETE FROM mvcc_demo; >>> >>>>>>>> DELETE 0 >>> >>>>>>>> test=# DELETE FROM mvcc_demo; >>> >>>>>>>> DELETE 0 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+------+-----------+------------- >>> >>>>>>>> 80685 | 0 | 3 | f >>> >>>>>>>> 80685 | 0 | 4 | f >>> >>>>>>>> 80685 | 0 | 5 | f >>> >>>>>>>> (3 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# DELETE FROM mvcc_demo; >>> >>>>>>>> DELETE 3 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+-------+-----------+------------- >>> >>>>>>>> 80685 | 80685 | 0 | t >>> >>>>>>>> 80685 | 80685 | 1 | t >>> >>>>>>>> 80685 | 80685 | 2 | t >>> >>>>>>>> (3 rows) >>> >>>>>>>> >>> >>>>>>>> Note that since is_combocid is true the numbers are not >>> cmin/cmax >>> >>>>>>>> they are actually >>> >>>>>>>> the indexes of the internal array already explained above. >>> >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 >>> >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 >>> >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# END; >>> >>>>>>>> COMMIT >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# TRUNCATE mvcc_demo; >>> >>>>>>>> TRUNCATE TABLE >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# BEGIN; >>> >>>>>>>> BEGIN >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+------+-----------+------------- >>> >>>>>>>> 80675 | 0 | 0 | f >>> >>>>>>>> 80675 | 0 | 1 | f >>> >>>>>>>> 80675 | 0 | 2 | f >>> >>>>>>>> (3 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>> >>>>>>>> UPDATE 3 >>> >>>>>>>> test=# >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+-------+-----------+------------- >>> >>>>>>>> 80675 | 80675 | 0 | t >>> >>>>>>>> 80675 | 80675 | 1 | t >>> >>>>>>>> 80675 | 80675 | 2 | t >>> >>>>>>>> 80675 | 0 | 3 | f >>> >>>>>>>> 80675 | 0 | 3 | f >>> >>>>>>>> 80675 | 0 | 3 | f >>> >>>>>>>> (6 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# END; >>> >>>>>>>> COMMIT >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# TRUNCATE mvcc_demo; >>> >>>>>>>> TRUNCATE TABLE >>> >>>>>>>> test=# >>> >>>>>>>> >>> >>>>>>>> -- From one psql issue >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & >>> X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+------+-----------+------------- >>> >>>>>>>> 80677 | 0 | 0 | f >>> >>>>>>>> (1 row) >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> test=# -- From another issue >>> >>>>>>>> test=# BEGIN; >>> >>>>>>>> BEGIN >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+------+-----------+------------- >>> >>>>>>>> 80677 | 0 | 0 | f >>> >>>>>>>> 80678 | 0 | 0 | f >>> >>>>>>>> 80678 | 0 | 1 | f >>> >>>>>>>> 80678 | 0 | 2 | f >>> >>>>>>>> (4 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>> >>>>>>>> UPDATE 4 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+-------+-----------+------------- >>> >>>>>>>> 80678 | 80678 | 0 | t >>> >>>>>>>> 80678 | 80678 | 1 | t >>> >>>>>>>> 80678 | 80678 | 2 | t >>> >>>>>>>> 80677 | 80678 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> (8 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> >>> >>>>>>>> test=# -- Before finishing this, issue these from the first >>> psql >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & >>> X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+-------+-----------+------------- >>> >>>>>>>> 80678 | 80678 | 0 | t >>> >>>>>>>> 80678 | 80678 | 1 | t >>> >>>>>>>> 80678 | 80678 | 2 | t >>> >>>>>>>> 80677 | 80678 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> (8 rows) >>> >>>>>>>> >>> >>>>>>>> test=# END; >>> >>>>>>>> COMMIT >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Now consider the case we are trying to solve >>> >>>>>>>> >>> >>>>>>>> drop table tt1; >>> >>>>>>>> create table tt1(f1 int); >>> >>>>>>>> >>> >>>>>>>> BEGIN; >>> >>>>>>>> insert into tt1 values(1); >>> >>>>>>>> declare c50 cursor for select * from tt1; -- should show one >>> row >>> >>>>>>>> only >>> >>>>>>>> insert into tt1 values(2); >>> >>>>>>>> fetch all from c50; >>> >>>>>>>> COMMIT; >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Consider Data node 1 log >>> >>>>>>>> >>> >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >>> read >>> >>>>>>>> committed READ WRITE] >>> >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >>> >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>> >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>> >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >>> read >>> >>>>>>>> committed READ WRITE] >>> >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>> >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>> >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>> >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >>> read >>> >>>>>>>> committed READ WRITE] >>> >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>> >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>> >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, >>> pgxc_node_str() >>> >>>>>>>> FROM tt1] >>> >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>> >>>>>>>> >>> >>>>>>>> The cursor currently shows both inserted rows because command >>> id at >>> >>>>>>>> data node in >>> >>>>>>>> step (j) is 0 >>> >>>>>>>> step (k) is 1 & >>> >>>>>>>> step (l) is 2 >>> >>>>>>>> >>> >>>>>>>> Where as we need command ids to be >>> >>>>>>>> >>> >>>>>>>> step (j) should be 0 >>> >>>>>>>> step (k) should be 2 & >>> >>>>>>>> step (l) should be 1 >>> >>>>>>>> >>> >>>>>>>> This will solve the cursor visibility problem. >>> >>>>>>>> >>> >>>>>>>> To implement this I suggest we send command IDs to data nodes >>> from >>> >>>>>>>> the coordinator >>> >>>>>>>> like we send gxid. The only difference will be that we do not >>> need >>> >>>>>>>> to take command IDs >>> >>>>>>>> from GTM since they are only valid with in the transaction. >>> >>>>>>>> >>> >>>>>>>> See this example >>> >>>>>>>> >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> ------+------+------+------+---- >>> >>>>>>>> (0 rows) >>> >>>>>>>> >>> >>>>>>>> test=# begin; >>> >>>>>>>> BEGIN >>> >>>>>>>> test=# insert into tt1 values(1); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> (1 row) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(2); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> (2 rows) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(3); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> (3 rows) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(4); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>> >>>>>>>> (4 rows) >>> >>>>>>>> >>> >>>>>>>> test=# end; >>> >>>>>>>> COMMIT >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>> >>>>>>>> (4 rows) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(5); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >>> >>>>>>>> (5 rows) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(6); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >>> >>>>>>>> 80617 | 0 | 0 | 0 | 6 >>> >>>>>>>> (6 rows) >>> >>>>>>>> >>> >>>>>>>> Note that at the end of the multi-statement transaction the >>> command >>> >>>>>>>> id gets reset to zero. >>> >>>>>>>> >>> >>>>>>>> -- >>> >>>>>>>> Abbas >>> >>>>>>>> Architect >>> >>>>>>>> EnterpriseDB Corporation >>> >>>>>>>> The Enterprise PostgreSQL Company >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -- >>> >>>>>>> -- >>> >>>>>>> Abbas >>> >>>>>>> Architect >>> >>>>>>> EnterpriseDB Corporation >>> >>>>>>> The Enterprise PostgreSQL Company >>> >>>>>>> >>> >>>>>>> Phone: 92-334-5100153 >>> >>>>>>> >>> >>>>>>> Website: www.enterprisedb.com >>> >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>>>>>> >>> >>>>>>> This e-mail message (and any attachment) is intended for the use >>> of >>> >>>>>>> the individual or entity to whom it is addressed. This message >>> >>>>>>> contains information from EnterpriseDB Corporation that may be >>> >>>>>>> privileged, confidential, or exempt from disclosure under >>> applicable >>> >>>>>>> law. If you are not the intended recipient or authorized to >>> receive >>> >>>>>>> this for the intended recipient, any use, dissemination, >>> >>>>>>> distribution, >>> >>>>>>> retention, archiving, or copying of this communication is >>> strictly >>> >>>>>>> prohibited. If you have received this e-mail in error, please >>> notify >>> >>>>>>> the sender immediately by reply e-mail and delete this message. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> ------------------------------------------------------------------------------ >>> >>>>>>> Live Security Virtual Conference >>> >>>>>>> Exclusive live event will cover all the ways today's security and >>> >>>>>>> threat landscape has changed and how IT managers can respond. >>> >>>>>>> Discussions >>> >>>>>>> will include endpoint security, mobile security and the latest in >>> >>>>>>> malware >>> >>>>>>> threats. >>> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> >>>>>>> _______________________________________________ >>> >>>>>>> Postgres-xc-developers mailing list >>> >>>>>>> Pos...@li... >>> >>>>>>> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> -- >>> >>>>>> Michael Paquier >>> >>>>>> http://michael.otacoo.com >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> -- >>> >>>>> Abbas >>> >>>>> Architect >>> >>>>> EnterpriseDB Corporation >>> >>>>> The Enterprise PostgreSQL Company >>> >>>>> >>> >>>>> Phone: 92-334-5100153 >>> >>>>> >>> >>>>> Website: www.enterprisedb.com >>> >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>>>> >>> >>>>> This e-mail message (and any attachment) is intended for the use of >>> >>>>> the individual or entity to whom it is addressed. This message >>> >>>>> contains information from EnterpriseDB Corporation that may be >>> >>>>> privileged, confidential, or exempt from disclosure under >>> applicable >>> >>>>> law. If you are not the intended recipient or authorized to receive >>> >>>>> this for the intended recipient, any use, dissemination, >>> distribution, >>> >>>>> retention, archiving, or copying of this communication is strictly >>> >>>>> prohibited. If you have received this e-mail in error, please >>> notify >>> >>>>> the sender immediately by reply e-mail and delete this message. >>> >>>>> >>> >>>>> >>> >>>>> >>> ------------------------------------------------------------------------------ >>> >>>>> Live Security Virtual Conference >>> >>>>> Exclusive live event will cover all the ways today's security and >>> >>>>> threat landscape has changed and how IT managers can respond. >>> >>>>> Discussions >>> >>>>> will include endpoint security, mobile security and the latest in >>> >>>>> malware >>> >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> >>>>> _______________________________________________ >>> >>>>> Postgres-xc-developers mailing list >>> >>>>> Pos...@li... >>> >>>>> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Best Wishes, >>> >>>> Ashutosh Bapat >>> >>>> EntepriseDB Corporation >>> >>>> The Enterprise Postgres Company >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> -- >>> >>> Abbas >>> >>> Architect >>> >>> EnterpriseDB Corporation >>> >>> The Enterprise PostgreSQL Company >>> >>> >>> >>> Phone: 92-334-5100153 >>> >>> >>> >>> Website: www.enterprisedb.com >>> >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> >>> >>> This e-mail message (and any attachment) is intended for the use of >>> >>> the individual or entity to whom it is addressed. This message >>> >>> contains information from EnterpriseDB Corporation that may be >>> >>> privileged, confidential, or exempt from disclosure under applicable >>> >>> law. If you are not the intended recipient or authorized to receive >>> >>> this for the intended recipient, any use, dissemination, >>> distribution, >>> >>> retention, archiving, or copying of this communication is strictly >>> >>> prohibited. If you have received this e-mail in error, please notify >>> >>> the sender immediately by reply e-mail and delete this message. >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> -- >>> >> Abbas >>> >> Architect >>> >> EnterpriseDB Corporation >>> >> The Enterprise PostgreSQL Company >>> >> >>> >> Phone: 92-334-5100153 >>> >> >>> >> Website: www.enterprisedb.com >>> >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> >> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >> >>> >> This e-mail message (and any attachment) is intended for the use of >>> >> the individual or entity to whom it is addressed. This message >>> >> contains information from EnterpriseDB Corporation that may be >>> >> privileged, confidential, or exempt from disclosure under applicable >>> >> law. If you are not the intended recipient or authorized to receive >>> >> this for the intended recipient, any use, dissemination, distribution, >>> >> retention, archiving, or copying of this communication is strictly >>> >> prohibited. If you have received this e-mail in error, please notify >>> >> the sender immediately by reply e-mail and delete this message. >>> > >>> > >>> > >>> > >>> > -- >>> > -- >>> > Abbas >>> > Architect >>> > EnterpriseDB Corporation >>> > The Enterprise PostgreSQL Company >>> > >>> > Phone: 92-334-5100153 >>> > >>> > Website: www.enterprisedb.com >>> > EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> > Follow us on Twitter: http://www.twitter.com/enterprisedb >>> > >>> > This e-mail message (and any attachment) is intended for the use of >>> > the individual or entity to whom it is addressed. This message >>> > contains information from EnterpriseDB Corporation that may be >>> > privileged, confidential, or exempt from disclosure under applicable >>> > law. If you are not the intended recipient or authorized to receive >>> > this for the intended recipient, any use, dissemination, distribution, >>> > retention, archiving, or copying of this communication is strictly >>> > prohibited. If you have received this e-mail in error, please notify >>> > the sender immediately by reply e-mail and delete this message. >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Live Security Virtual Conference >>> > Exclusive live event will cover all the ways today's security and >>> > threat landscape has changed and how IT managers can respond. >>> Discussions >>> > will include endpoint security, mobile security and the latest in >>> malware >>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> > _______________________________________________ >>> > Postgres-xc-developers mailing list >>> > Pos...@li... >>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> > >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Michael Paquier > http://michael.otacoo.com > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2012-07-19 06:01:09
|
Thanks Michael for the review, Thanks Ashutosh for the comments, Michael I will accommodate your valuable feedback and will send revised patch. Ashutosh I will send the summarized design in the email with the revised patch. BTW enb is short for enable! On Thu, Jul 19, 2012 at 10:37 AM, Michael Paquier <mic...@gm... > wrote: > Hi, > > Finally I took some time to look at this code, it is important support. > 1) The code is not realigned with current master (xc_misc.sql) and > contains whitespaces (2 in xc_misc.out, perhaps other files as well). So I > realigned it myself to test the code. > 2) In GetCurrentCommandId xact.c, you need to set !IsConnFromCoord() with > (IS_PGXC_COORDINATOR) to avoid remote coordinators incrementing the command > ID. Remote Coordinators do not receive any command IDs from remote nodes. > Also here using an if/else if structure won't hurt and bring clarity in the > operations. > 3) In BeginTransactionBlock of xact.c, you need to add !IsConnFromCoord() > as a secondary condition to avoid remote Coordinator problems. > 4) By looking at the patch It is possible to reduce the number of > variables used for the control of Command ID. In the current version of > your patch, you are using 5 static variables: > static CommandId cmdIdFromCoord; /* Datanode will use command id > received from coordinator */ > static bool cmdIdRcvdAtDatanode; /* Has the coordinator sent > command id to the data node? */ > static bool enbCmdIdChgReporting; /* Should datanode report > command id changes to coordinator? */ > static bool enbCoordCmdIdSending; /* Should coordinator send > command id to the data node? */ > static CommandId cmdIdFromDatanode; /* Coordinator will receive > updated command id from datanode */ > However, cmdIdRcvdAtDatanode, cmdIdFromCoord and enbCmdIdChgReporting are > used only at Datanode. > cmdIdFromDatanode and enbCoordCmdIdSending are only used at Coordinator > level. > You should combine enbCoordCmdIdSending and enbCmdIdChgReporting into a > single variable with a generic name, like let's say sendCommandId or > reportCommandId. Specifying in comments the role of this unique variable > for local Coordinator and remote nodes will help. Those 2 variables play > the same role which is to control if a > In the case of Coordinator, this variable is flagged to true when > transforming INSERT statements (transformInsertStmt) and when a transaction > block is begun (BeginTransactionBlock). > In the case of a Datanode, this variable is flagged to true only when a > command ID has been received from Coordinator. > > Then, you should combine cmdIdFromCoord and cmdIdFromDatanode into a > single variable, like receivedCommandId. In this case, being at Coordinator > or at Datanode means that this command ID has been received from a remode > node. If you have other ideas for names please feel free! > Also changing the name cmdIdRcvdAtDatanode to isCommandIdReceived might > help... > Just a question, what means the preffix enb? (just by curiosity) :) > > 5) In xact.c, at the end of CommitTransaction, PrepareTransaction, > CleanupTransaction, you will need to change the cleanup of the variables if > you take into account comment 4). > 6) In BeginTransactionBlock, PrepareTransactionBlock, EndTransactionBlock > and UserAbortTransactionBlock of xact.c, you need to set an additional > !IsConnFromCoord() at this place, this will avoid remote Coordinators > trying to send CommandIds to inexistent nodes. > if(IS_PGXC_COORDINATOR) > + SetCmdIdSending(false); > > 7) You should combine the messages 'Y' and 'M'. The content and the goals > are the same. You can also eliminate the 2nd byte of 'Y' without > consequences I think. > 8) Why didn't you put the reception of message 'M' in PostgresMain of > postgres.c at the same place as the reception of GXID, snapshot, timestamp, > etc. > Putting the reception of message at the end of SocketBackend makes the > code kind of inconsistent. If you keep it inside SocketBackend, you should > at least manage the message inside the switch, however I do not see any > particular reason why you receive this message in SocketBackend and not in > PostgresMain. So anything I should know? > You should also remove the flag IS_PGXC_DATANODE when receiving the > command ID. Remote Coordinator might need to be able to receive command IDs > as well, even if there is no direct usage now, but for stuff like triggers?? > 9) I found some diffs with the test select_views. It might be worth to > look at what is happening. It is perhaps an environment-related issue. Btw, > my regression diffs are attached. > 10) The modifications you are bringing to combocid_1.out are consistent > with postgres, cool! > > No performance impact with this patch! Tested and approved. This is good. > No regression impact with this patch. > > So, to conclude, I think it is a good piece of work, really cool stuff > that I believe will be helpful for triggers and even other things. > It just needs to be polished and simplified a bit. However basics are > here, and performance is not impacted. Really it was worth spending time on > that. The additional test cases also look to be sufficient, but I might be > missing smth so feel free to add new tests if necessary. > Thanks! > > -- > Michael Paquier > http://michael.otacoo.com > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2012-07-19 17:16:07
Attachments:
17_cvp.patch
|
-------------------- 1. Background -------------------- In a multi-statement transaction each statement is given a command identifier starting from zero and incrementing for each statement. These command indentifers are required for extra tracking because each statement has its own visibility rules with in the transaction. For example, a cursor’s contents must remain unchanged even if later statements in the same transaction modify rows. Such tracking is implemented using system command id columns cmin/cmax, which is internally actually is a single column. cmin/cmax come into play in case of multi-statement transactions only, they are both zero otherwise. cmin "The command identifier of the statement within the inserting transaction." cmax "The command identifier of the statement within the deleting transaction." Here are the visibility rules (taken from comments of tqual.c) ( // A heap tuple is valid "now" iff Xmin == my-transaction && // inserted by the current transaction Cmin < my-command && // before this command, and ( Xmax is null || // the row has not been deleted, or ( Xmax == my-transaction && // it was deleted by the current transaction Cmax >= my-command // but not before this command, ) ) ) || // or ( Xmin is committed && // the row was inserted by a committed transaction, and ( Xmax is null || // the row has not been deleted, or ( Xmax == my-transaction && // the row is being deleted by this transaction Cmax >= my-command) || // but it's not deleted "yet", or ( Xmax != my-transaction && // the row was deleted by another transaction Xmax is not committed // that has not been committed ) ) ) ) Because cmin and cmax are internally a single system column, it is therefore not possible to simply record the status of a row that is created and expired in the same multi-statement transaction. For that reason, a special combo command id is created that references a local memory hash that contains the actual cmin and cmax values. It means that if combo id is being used the number we are seeing would not be the cmin or cmax it will be an index into a local array that contains a structure with has the actual cmin and cmax values. -------------------- 2. Cursor Visibility Problem (CVP) -------------------- Consider the test case create table tt1(f1 int); BEGIN; insert into tt1 values(1); declare c50 cursor for select * from tt1; -- should show one row only insert into tt1 values(2); fetch all from c50; COMMIT; The fetch shows both inserted rows, where as it is supposed to show only 1. -------------------- 3. Solution of CVP -------------------- Consider Data node 1 log (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read committed READ WRITE] (b) [exec_simple_query][1026][drop table tt1;] (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read committed READ WRITE] (f) [exec_simple_query][1026][create table tt1(f1 int);] (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read committed READ WRITE] (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM tt1] (m) [exec_simple_query][1026][COMMIT TRANSACTION] The cursor currently shows both inserted rows because command id at data node in step (j) is 0 step (k) is 1 & step (l) is 2 Where as we need command ids to be step (j) should be 0 step (k) should be 2 & step (l) should be 1 This will solve the cursor visibility problem. To implement this we send command IDs to data nodes from the coordinator like we send gxid. The only difference will be that we do not need to take command IDs from GTM since they are only valid with in the transaction. A new message 'M' is used for this purpose. However sending command ids from coordinator to data node alone is not sufficient because data node can increment command ids e.g. in case of triggers fired at the data node or rules etc. These changes must be brought into the notice of the coordinator for it to decide the command id of the next command correctly. Hence we send changes in command ids from data node to the coordinator as well. The coordinator chooses the largest command id for the next command to issue. A new message 'Y' is used for this purpose. -------------------- 4. INSERT-SELECT Problem -------------------- Consider the use of INSERT-SELECT in case rows are inserted into a child by selecting from its parent. In this case XC was transforming the single INSERT-SELECT command into a multi-statement transaction block on the data nodes. [STEP 1.] START TRANSACTION ISOLATION LEVEL read committed READ WRITE [STEP 2.] SELECT col1,col2 FROM ONLY parent_table WHERE true [STEP 3.] INSERT INTO public.child_table (col1, col2) VALUES ($1, $2) [STEP 4.] (-- bind and execute here --) [STEP 5.] SELECT col1, col2 FROM ONLY child_table WHERE true [STEP 6.] INSERT INTO public.child_table (col1, col2) VALUES ($1, $2) [STEP 7.] (-- bind and execute here --) [STEP 8.] COMMIT TRANSACTION The inserts of step 3 will be visible to select of step 5, hence we were getting duplicate rows in the child table. -------------------- 5. Solution of INSERT-SELECT Problem -------------------- Note down the command id of the insert-select and use that as the command id of select query in [STEP 5.]. The infrastructure developed to exchange command ids is used to do so. |
From: Michael P. <mic...@gm...> - 2012-07-20 06:04:13
|
Hi, Sorry for being a little bit picky here: 1) In GetCurrentSubTransactionId of xact.c, you can replace: if (IS_PGXC_DATANODE && isCommandIdReceived) { foo } else { if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) re_foo } by: if (IS_PGXC_DATANODE && isCommandIdReceived) { foo } else if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { re_foo } 2) I am not so much a fan of this 'Y' message as for now it just has a single usage, and M is doing exactly the same thing... But seen as a message that a remote node sends a notification to its backend, ok it makes some sense. In this case you should avoid to use a magic number like '1'. Please use a define like: #define DATANODE_NOTIFICATION_COMMAND_ID '1' A place like pgxc.h would make sense to define that. Then use this variable at the places where you need it (HandleDatanodeNotification of execRemote.c and ReportCommandIdChange of xact.c). It will bring more visibility to your code. 3) SetCmdIdSending and SetCmdIdReporting should be combined, they serve the same purpose. Their only difference is the node type where they are used. Let's make that generic. As a general name, SetCommandIdSending would be enough. 4) GetCmdIdOfDatanode and GetCmdIdOfCoordinator should also be combined. They do exactly the same thing, the only difference is that one is launched on Datanode and the other on Coordinator. A general name like GetCommandIdReceived would be nicer. 5) In portalcmds.c, PerformCursorOpen. You need to replace if (IS_PGXC_COORDINATOR) GetCurrentCommandId(true); by: if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) GetCurrentCommandId(true); 6) For the regressions, I am still getting the diffs. Based on your test, I get this ordering for PG and XC: pgxc# select * from test_text order by 1; a ------------------------------------ Bancroft Ave Bancroft Ave Birch St Birch St Blacow Road Bridgepointe Dr Broadmore Ave Broadway B St (9 rows) If we are getting the same diffs we should change the output, it looks to be a side-effect of your fix for INSERT SELECT on parent-child, On Fri, Jul 20, 2012 at 2:15 AM, Abbas Butt <abb...@en...>wrote: > Attached please find the updated patch. > > On Thu, Jul 19, 2012 at 11:01 AM, Abbas Butt <abb...@en...>wrote: > >> Thanks Michael for the review, Thanks Ashutosh for the comments, >> Michael I will accommodate your valuable feedback and will send revised >> patch. >> Ashutosh I will send the summarized design in the email with the revised >> patch. >> BTW enb is short for enable! >> >> >> On Thu, Jul 19, 2012 at 10:37 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi, >>> >>> Finally I took some time to look at this code, it is important support. >>> 1) The code is not realigned with current master (xc_misc.sql) and >>> contains whitespaces (2 in xc_misc.out, perhaps other files as well). So I >>> realigned it myself to test the code. >>> >> > The attached patch is aligned with the current master and I have removed > white spaces. > > >> 2) In GetCurrentCommandId xact.c, you need to set !IsConnFromCoord() >>> with (IS_PGXC_COORDINATOR) to avoid remote coordinators incrementing the >>> command ID. Remote Coordinators do not receive any command IDs from remote >>> nodes. Also here using an if/else if structure won't hurt and bring clarity >>> in the operations. >>> >> > Added the check and used if/else > > >> 3) In BeginTransactionBlock of xact.c, you need to add >>> !IsConnFromCoord() as a secondary condition to avoid remote Coordinator >>> problems. >>> >> > Done > > >> 4) By looking at the patch It is possible to reduce the number of >>> variables used for the control of Command ID. In the current version of >>> your patch, you are using 5 static variables: >>> static CommandId cmdIdFromCoord; /* Datanode will use command >>> id received from coordinator */ >>> static bool cmdIdRcvdAtDatanode; /* Has the coordinator sent >>> command id to the data node? */ >>> static bool enbCmdIdChgReporting; /* Should datanode report >>> command id changes to coordinator? */ >>> static bool enbCoordCmdIdSending; /* Should coordinator send >>> command id to the data node? */ >>> static CommandId cmdIdFromDatanode; /* Coordinator will receive >>> updated command id from datanode */ >>> However, cmdIdRcvdAtDatanode, cmdIdFromCoord and enbCmdIdChgReporting >>> are used only at Datanode. >>> cmdIdFromDatanode and enbCoordCmdIdSending are only used at Coordinator >>> level. >>> You should combine enbCoordCmdIdSending and enbCmdIdChgReporting into a >>> single variable with a generic name, like let's say sendCommandId or >>> reportCommandId. Specifying in comments the role of this unique variable >>> for local Coordinator and remote nodes will help. Those 2 variables play >>> the same role which is to control if a >>> In the case of Coordinator, this variable is flagged to true when >>> transforming INSERT statements (transformInsertStmt) and when a transaction >>> block is begun (BeginTransactionBlock). >>> In the case of a Datanode, this variable is flagged to true only when a >>> command ID has been received from Coordinator. >>> >>> > Done, I used functions to retain code readability and inside the functions > I used the same variables as u suggested. > > >> Then, you should combine cmdIdFromCoord and cmdIdFromDatanode into a >>> single variable, like receivedCommandId. In this case, being at Coordinator >>> or at Datanode means that this command ID has been received from a remode >>> node. If you have other ideas for names please feel free! >>> >> > Done, again I used functions to retain code readability. I however had to > add checks inside the functions to make sure that the functions intended > for data node work only for data nodes and the ones intended for > coordinators work only for coordinators. > > >> Also changing the name cmdIdRcvdAtDatanode to isCommandIdReceived might >>> help... >>> >> > Done > > >> Just a question, what means the preffix enb? (just by curiosity) :) >>> >>> 5) In xact.c, at the end of CommitTransaction, PrepareTransaction, >>> CleanupTransaction, you will need to change the cleanup of the variables if >>> you take into account comment 4). >>> >> > Done > > >> 6) In BeginTransactionBlock, PrepareTransactionBlock, EndTransactionBlock >>> and UserAbortTransactionBlock of xact.c, you need to set an additional >>> !IsConnFromCoord() at this place, this will avoid remote Coordinators >>> trying to send CommandIds to inexistent nodes. >>> if(IS_PGXC_COORDINATOR) >>> + SetCmdIdSending(false); >>> >> > Done > > >> >>> 7) You should combine the messages 'Y' and 'M'. The content and the >>> goals are the same. You can also eliminate the 2nd byte of 'Y' without >>> consequences I think. >>> >> > I did not combine these two. The intention of having them separate is > because 'Y' should not be tied to 'M', In future we might need data nodes > to notify coordinators of some thing else. And for the same reason I have > added a second byte in 'Y' so that we can have the same command notify > coordinators of some thing else. > > >> 8) Why didn't you put the reception of message 'M' in PostgresMain of >>> postgres.c at the same place as the reception of GXID, snapshot, timestamp, >>> etc. >>> Putting the reception of message at the end of SocketBackend makes the >>> code kind of inconsistent. If you keep it inside SocketBackend, you should >>> at least manage the message inside the switch, however I do not see any >>> particular reason why you receive this message in SocketBackend and not in >>> PostgresMain. So anything I should know? >>> You should also remove the flag IS_PGXC_DATANODE when receiving the >>> command ID. Remote Coordinator might need to be able to receive command IDs >>> as well, even if there is no direct usage now, but for stuff like triggers?? >>> >> > Done, I added handling of 'M' in PostgresMain. > > >> 9) I found some diffs with the test select_views. It might be worth to >>> look at what is happening. It is perhaps an environment-related issue. Btw, >>> my regression diffs are attached. >>> >> > I was also getting these diffs but it was difficult for me to think that > the changes I did could cause a change in oder of rows. Here is what I did > to dig the issue deeper. > > create table test_text(a text); > > insert into test_text values('Bancroft Ave'); > insert into test_text values('Bancroft Ave'); > insert into test_text values('Birch St'); > insert into test_text values('Birch St'); > insert into test_text values('Blacow Road'); > insert into test_text values('Bridgepointe Dr'); > insert into test_text values('Broadmore Ave'); > insert into test_text values('Broadway '); > insert into test_text values('B St'); > > select * from test_text order by 1 > produces same results with plain PG and XC > > Also I changed the query in plain PG select_views test case to > SELECT * FROM street ORDER BY name,cname,thepath::text; > and obtained same results with plain PG and XC. > > I am not sure why the test case was passing earlier with the current > possible expected outputs. > > Summarized design notes are attached with the mail for reference. > > 10) The modifications you are bringing to combocid_1.out are consistent >>> with postgres, cool! >>> >>> No performance impact with this patch! Tested and approved. This is good. >>> No regression impact with this patch. >>> >>> So, to conclude, I think it is a good piece of work, really cool stuff >>> that I believe will be helpful for triggers and even other things. >>> It just needs to be polished and simplified a bit. However basics are >>> here, and performance is not impacted. Really it was worth spending time on >>> that. The additional test cases also look to be sufficient, but I might be >>> missing smth so feel free to add new tests if necessary. >>> Thanks! >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@en...> - 2012-07-21 09:29:30
Attachments:
18_cvp.patch
|
On Fri, Jul 20, 2012 at 11:04 AM, Michael Paquier <mic...@gm... > wrote: > Hi, > > Sorry for being a little bit picky here: > 1) In GetCurrentSubTransactionId of xact.c, you can replace: > if (IS_PGXC_DATANODE && isCommandIdReceived) > { > foo > } > else > { > if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) > re_foo > } > by: > if (IS_PGXC_DATANODE && isCommandIdReceived) > { > foo > } > else if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) > { > re_foo > } > Done > 2) I am not so much a fan of this 'Y' message as for now it just has a > single usage, and M is doing exactly the same thing... But seen as a > message that a remote node sends a notification to its backend, ok it makes > some sense. In this case you should avoid to use a magic number like '1'. > Please use a define like: > #define DATANODE_NOTIFICATION_COMMAND_ID '1' > A place like pgxc.h would make sense to define that. > Then use this variable at the places where you need it > (HandleDatanodeNotification of execRemote.c and ReportCommandIdChange of > xact.c). It will bring more visibility to your code. > Done > 3) SetCmdIdSending and SetCmdIdReporting should be combined, they serve > the same purpose. Their only difference is the node type where they are > used. Let's make that generic. As a general name, SetCommandIdSending would > be enough. > I am not getting how to do that, consider the example of StartTransaction function where we want to SetCmdIdReporting(false). Suppose we combine both functions and have the new function like this void SetCmdIdSending(bool enb) { if(IS_PGXC_COORDINATOR && !IsConnFromCoord()) sendCommandId = enb; else if(IS_PGXC_DATANODE) sendCommandId = enb; } and that we call this function instead of SetCmdIdReporting. It will work OK for data node but when the same code executes for coordinator we will set sendCommandId to false which was not required. Perhaps I did not get your point. > 4) GetCmdIdOfDatanode and GetCmdIdOfCoordinator should also be combined. > They do exactly the same thing, the only difference is that one is launched > on Datanode and the other on Coordinator. A general name like > GetCommandIdReceived would be nicer. > See my comment for point (3) 5) In portalcmds.c, PerformCursorOpen. You need to replace > if (IS_PGXC_COORDINATOR) > GetCurrentCommandId(true); > by: > if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) > GetCurrentCommandId(true); > Done. > 6) For the regressions, I am still getting the diffs. Based on your test, > I get this ordering for PG and XC: > pgxc# select * from test_text order by 1; > a > ------------------------------------ > Bancroft Ave > Bancroft Ave > Birch St > Birch St > Blacow Road > Bridgepointe Dr > Broadmore Ave > Broadway > B St > (9 rows) > If we are getting the same diffs we should change the output, it looks to > be a side-effect of your fix for INSERT SELECT on parent-child, > Done > > On Fri, Jul 20, 2012 at 2:15 AM, Abbas Butt <abb...@en...>wrote: > >> Attached please find the updated patch. >> >> On Thu, Jul 19, 2012 at 11:01 AM, Abbas Butt <abb...@en... >> > wrote: >> >>> Thanks Michael for the review, Thanks Ashutosh for the comments, >>> Michael I will accommodate your valuable feedback and will send revised >>> patch. >>> Ashutosh I will send the summarized design in the email with the revised >>> patch. >>> BTW enb is short for enable! >>> >>> >>> On Thu, Jul 19, 2012 at 10:37 AM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> Hi, >>>> >>>> Finally I took some time to look at this code, it is important support. >>>> 1) The code is not realigned with current master (xc_misc.sql) and >>>> contains whitespaces (2 in xc_misc.out, perhaps other files as well). So I >>>> realigned it myself to test the code. >>>> >>> >> The attached patch is aligned with the current master and I have removed >> white spaces. >> >> >>> 2) In GetCurrentCommandId xact.c, you need to set !IsConnFromCoord() >>>> with (IS_PGXC_COORDINATOR) to avoid remote coordinators incrementing the >>>> command ID. Remote Coordinators do not receive any command IDs from remote >>>> nodes. Also here using an if/else if structure won't hurt and bring clarity >>>> in the operations. >>>> >>> >> Added the check and used if/else >> >> >>> 3) In BeginTransactionBlock of xact.c, you need to add >>>> !IsConnFromCoord() as a secondary condition to avoid remote Coordinator >>>> problems. >>>> >>> >> Done >> >> >>> 4) By looking at the patch It is possible to reduce the number of >>>> variables used for the control of Command ID. In the current version of >>>> your patch, you are using 5 static variables: >>>> static CommandId cmdIdFromCoord; /* Datanode will use command >>>> id received from coordinator */ >>>> static bool cmdIdRcvdAtDatanode; /* Has the coordinator sent >>>> command id to the data node? */ >>>> static bool enbCmdIdChgReporting; /* Should datanode report >>>> command id changes to coordinator? */ >>>> static bool enbCoordCmdIdSending; /* Should coordinator send >>>> command id to the data node? */ >>>> static CommandId cmdIdFromDatanode; /* Coordinator will receive >>>> updated command id from datanode */ >>>> However, cmdIdRcvdAtDatanode, cmdIdFromCoord and enbCmdIdChgReporting >>>> are used only at Datanode. >>>> cmdIdFromDatanode and enbCoordCmdIdSending are only used at Coordinator >>>> level. >>>> You should combine enbCoordCmdIdSending and enbCmdIdChgReporting into a >>>> single variable with a generic name, like let's say sendCommandId or >>>> reportCommandId. Specifying in comments the role of this unique variable >>>> for local Coordinator and remote nodes will help. Those 2 variables play >>>> the same role which is to control if a >>>> In the case of Coordinator, this variable is flagged to true when >>>> transforming INSERT statements (transformInsertStmt) and when a transaction >>>> block is begun (BeginTransactionBlock). >>>> In the case of a Datanode, this variable is flagged to true only when a >>>> command ID has been received from Coordinator. >>>> >>>> >> Done, I used functions to retain code readability and inside the >> functions I used the same variables as u suggested. >> >> >>> Then, you should combine cmdIdFromCoord and cmdIdFromDatanode into a >>>> single variable, like receivedCommandId. In this case, being at Coordinator >>>> or at Datanode means that this command ID has been received from a remode >>>> node. If you have other ideas for names please feel free! >>>> >>> >> Done, again I used functions to retain code readability. I however had to >> add checks inside the functions to make sure that the functions intended >> for data node work only for data nodes and the ones intended for >> coordinators work only for coordinators. >> >> >>> Also changing the name cmdIdRcvdAtDatanode to isCommandIdReceived might >>>> help... >>>> >>> >> Done >> >> >>> Just a question, what means the preffix enb? (just by curiosity) :) >>>> >>>> 5) In xact.c, at the end of CommitTransaction, PrepareTransaction, >>>> CleanupTransaction, you will need to change the cleanup of the variables if >>>> you take into account comment 4). >>>> >>> >> Done >> >> >>> 6) In BeginTransactionBlock, PrepareTransactionBlock, >>>> EndTransactionBlock and UserAbortTransactionBlock of xact.c, you need to >>>> set an additional !IsConnFromCoord() at this place, this will avoid remote >>>> Coordinators trying to send CommandIds to inexistent nodes. >>>> if(IS_PGXC_COORDINATOR) >>>> + SetCmdIdSending(false); >>>> >>> >> Done >> >> >>> >>>> 7) You should combine the messages 'Y' and 'M'. The content and the >>>> goals are the same. You can also eliminate the 2nd byte of 'Y' without >>>> consequences I think. >>>> >>> >> I did not combine these two. The intention of having them separate is >> because 'Y' should not be tied to 'M', In future we might need data nodes >> to notify coordinators of some thing else. And for the same reason I have >> added a second byte in 'Y' so that we can have the same command notify >> coordinators of some thing else. >> >> >>> 8) Why didn't you put the reception of message 'M' in PostgresMain of >>>> postgres.c at the same place as the reception of GXID, snapshot, timestamp, >>>> etc. >>>> Putting the reception of message at the end of SocketBackend makes the >>>> code kind of inconsistent. If you keep it inside SocketBackend, you should >>>> at least manage the message inside the switch, however I do not see any >>>> particular reason why you receive this message in SocketBackend and not in >>>> PostgresMain. So anything I should know? >>>> You should also remove the flag IS_PGXC_DATANODE when receiving the >>>> command ID. Remote Coordinator might need to be able to receive command IDs >>>> as well, even if there is no direct usage now, but for stuff like triggers?? >>>> >>> >> Done, I added handling of 'M' in PostgresMain. >> >> >>> 9) I found some diffs with the test select_views. It might be worth to >>>> look at what is happening. It is perhaps an environment-related issue. Btw, >>>> my regression diffs are attached. >>>> >>> >> I was also getting these diffs but it was difficult for me to think that >> the changes I did could cause a change in oder of rows. Here is what I did >> to dig the issue deeper. >> >> create table test_text(a text); >> >> insert into test_text values('Bancroft Ave'); >> insert into test_text values('Bancroft Ave'); >> insert into test_text values('Birch St'); >> insert into test_text values('Birch St'); >> insert into test_text values('Blacow Road'); >> insert into test_text values('Bridgepointe Dr'); >> insert into test_text values('Broadmore Ave'); >> insert into test_text values('Broadway '); >> insert into test_text values('B St'); >> >> select * from test_text order by 1 >> produces same results with plain PG and XC >> >> Also I changed the query in plain PG select_views test case to >> SELECT * FROM street ORDER BY name,cname,thepath::text; >> and obtained same results with plain PG and XC. >> >> I am not sure why the test case was passing earlier with the current >> possible expected outputs. >> >> Summarized design notes are attached with the mail for reference. >> >> 10) The modifications you are bringing to combocid_1.out are consistent >>>> with postgres, cool! >>>> >>>> No performance impact with this patch! Tested and approved. This is >>>> good. >>>> No regression impact with this patch. >>>> >>>> So, to conclude, I think it is a good piece of work, really cool stuff >>>> that I believe will be helpful for triggers and even other things. >>>> It just needs to be polished and simplified a bit. However basics are >>>> here, and performance is not impacted. Really it was worth spending time on >>>> that. The additional test cases also look to be sufficient, but I might be >>>> missing smth so feel free to add new tests if necessary. >>>> Thanks! >>>> >>>> -- >>>> Michael Paquier >>>> http://michael.otacoo.com >>>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > Michael Paquier > http://michael.otacoo.com > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2012-07-23 05:05:41
|
Thanks Michael, I just reviewed the patch and found one small thing. In the function transformInsertStmt where we set SetSendCommandId(true); I think it would be better to put this statement under an if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) Other than that I am OK with the changes. Regards On Mon, Jul 23, 2012 at 7:08 AM, Michael Paquier <mic...@gm...>wrote: > Hi, > > I spent some time this morning rewriting your patch. > I reduced the number of functions and parameters to a minimum number and > eliminated the useless message 'Y'. > 'Y' definitely doesn't make sense as it has absolutely the same role as > 'M'. In the former patch, the notion of notification is simply to send a > message from Coordinator to Datanode, but a Datanode can also send a GXID, > snapshot and timestamp using the same message types as the one used when a > Coordinator sends data to a remote node. So we shouldn't lose this > portability of the new messages. The only thing that changes for > Coordinator->remoteNode and remoteCode->Coordinator communication protocol > is the place where message is read. in the first case, message is received > by remote node in postgres.c stuff. In the second case, message is received > inside execRemote.c, so this distinction makes perfectly safe the fact of > using the same message 'M', making 'Y' unnecessary. > Also, it is really important to limit the number of used message types to > reduce the footprint of XC code on PostgreSQL. > > There are no problems with regressions, and performance is not impacted. I > also corrected some comment format and added some other comments at various > places. > In consequence, and to simplify the work of everybody, I am going to > commit this largely simplified version of the patch. > > Thanks, > > > On Sat, Jul 21, 2012 at 6:29 PM, Abbas Butt <abb...@en...>wrote: > >> >> >> On Fri, Jul 20, 2012 at 11:04 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi, >>> >>> Sorry for being a little bit picky here: >>> 1) In GetCurrentSubTransactionId of xact.c, you can replace: >>> if (IS_PGXC_DATANODE && isCommandIdReceived) >>> { >>> foo >>> } >>> else >>> { >>> if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>> re_foo >>> } >>> by: >>> if (IS_PGXC_DATANODE && isCommandIdReceived) >>> { >>> foo >>> } >>> else if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>> { >>> re_foo >>> } >>> >> >> Done >> >> >>> 2) I am not so much a fan of this 'Y' message as for now it just has a >>> single usage, and M is doing exactly the same thing... But seen as a >>> message that a remote node sends a notification to its backend, ok it makes >>> some sense. In this case you should avoid to use a magic number like '1'. >>> Please use a define like: >>> #define DATANODE_NOTIFICATION_COMMAND_ID '1' >>> A place like pgxc.h would make sense to define that. >>> Then use this variable at the places where you need it >>> (HandleDatanodeNotification of execRemote.c and ReportCommandIdChange of >>> xact.c). It will bring more visibility to your code. >>> >> >> Done >> >> >>> 3) SetCmdIdSending and SetCmdIdReporting should be combined, they serve >>> the same purpose. Their only difference is the node type where they are >>> used. Let's make that generic. As a general name, SetCommandIdSending would >>> be enough. >>> >> >> I am not getting how to do that, consider the example of StartTransaction >> function where we want to SetCmdIdReporting(false). Suppose we combine both >> functions and have the new function like this >> >> void SetCmdIdSending(bool enb) >> { >> if(IS_PGXC_COORDINATOR && !IsConnFromCoord()) >> sendCommandId = enb; >> else if(IS_PGXC_DATANODE) >> sendCommandId = enb; >> } >> >> and that we call this function instead of SetCmdIdReporting. It will work >> OK for data node but when the same code executes for coordinator we will >> set sendCommandId to false which was not required. Perhaps I did not get >> your point. >> >> >>> 4) GetCmdIdOfDatanode and GetCmdIdOfCoordinator should also be combined. >>> They do exactly the same thing, the only difference is that one is launched >>> on Datanode and the other on Coordinator. A general name like >>> GetCommandIdReceived would be nicer. >>> >> >> See my comment for point (3) >> >> 5) In portalcmds.c, PerformCursorOpen. You need to replace >>> if (IS_PGXC_COORDINATOR) >>> GetCurrentCommandId(true); >>> by: >>> if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>> GetCurrentCommandId(true); >>> >> >> Done. >> >> >>> 6) For the regressions, I am still getting the diffs. Based on your >>> test, I get this ordering for PG and XC: >>> pgxc# select * from test_text order by 1; >>> a >>> ------------------------------------ >>> Bancroft Ave >>> Bancroft Ave >>> Birch St >>> Birch St >>> Blacow Road >>> Bridgepointe Dr >>> Broadmore Ave >>> Broadway >>> B St >>> (9 rows) >>> If we are getting the same diffs we should change the output, it looks >>> to be a side-effect of your fix for INSERT SELECT on parent-child, >>> >> >> Done >> >> >>> >>> On Fri, Jul 20, 2012 at 2:15 AM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> Attached please find the updated patch. >>>> >>>> On Thu, Jul 19, 2012 at 11:01 AM, Abbas Butt < >>>> abb...@en...> wrote: >>>> >>>>> Thanks Michael for the review, Thanks Ashutosh for the comments, >>>>> Michael I will accommodate your valuable feedback and will send >>>>> revised patch. >>>>> Ashutosh I will send the summarized design in the email with the >>>>> revised patch. >>>>> BTW enb is short for enable! >>>>> >>>>> >>>>> On Thu, Jul 19, 2012 at 10:37 AM, Michael Paquier < >>>>> mic...@gm...> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Finally I took some time to look at this code, it is important >>>>>> support. >>>>>> 1) The code is not realigned with current master (xc_misc.sql) and >>>>>> contains whitespaces (2 in xc_misc.out, perhaps other files as well). So I >>>>>> realigned it myself to test the code. >>>>>> >>>>> >>>> The attached patch is aligned with the current master and I have >>>> removed white spaces. >>>> >>>> >>>>> 2) In GetCurrentCommandId xact.c, you need to set !IsConnFromCoord() >>>>>> with (IS_PGXC_COORDINATOR) to avoid remote coordinators incrementing the >>>>>> command ID. Remote Coordinators do not receive any command IDs from remote >>>>>> nodes. Also here using an if/else if structure won't hurt and bring clarity >>>>>> in the operations. >>>>>> >>>>> >>>> Added the check and used if/else >>>> >>>> >>>>> 3) In BeginTransactionBlock of xact.c, you need to add >>>>>> !IsConnFromCoord() as a secondary condition to avoid remote Coordinator >>>>>> problems. >>>>>> >>>>> >>>> Done >>>> >>>> >>>>> 4) By looking at the patch It is possible to reduce the number of >>>>>> variables used for the control of Command ID. In the current version of >>>>>> your patch, you are using 5 static variables: >>>>>> static CommandId cmdIdFromCoord; /* Datanode will use >>>>>> command id received from coordinator */ >>>>>> static bool cmdIdRcvdAtDatanode; /* Has the coordinator >>>>>> sent command id to the data node? */ >>>>>> static bool enbCmdIdChgReporting; /* Should datanode report >>>>>> command id changes to coordinator? */ >>>>>> static bool enbCoordCmdIdSending; /* Should coordinator >>>>>> send command id to the data node? */ >>>>>> static CommandId cmdIdFromDatanode; /* Coordinator will receive >>>>>> updated command id from datanode */ >>>>>> However, cmdIdRcvdAtDatanode, cmdIdFromCoord and enbCmdIdChgReporting >>>>>> are used only at Datanode. >>>>>> cmdIdFromDatanode and enbCoordCmdIdSending are only used at >>>>>> Coordinator level. >>>>>> You should combine enbCoordCmdIdSending and enbCmdIdChgReporting into >>>>>> a single variable with a generic name, like let's say sendCommandId or >>>>>> reportCommandId. Specifying in comments the role of this unique variable >>>>>> for local Coordinator and remote nodes will help. Those 2 variables play >>>>>> the same role which is to control if a >>>>>> In the case of Coordinator, this variable is flagged to true when >>>>>> transforming INSERT statements (transformInsertStmt) and when a transaction >>>>>> block is begun (BeginTransactionBlock). >>>>>> In the case of a Datanode, this variable is flagged to true only when >>>>>> a command ID has been received from Coordinator. >>>>>> >>>>>> >>>> Done, I used functions to retain code readability and inside the >>>> functions I used the same variables as u suggested. >>>> >>>> >>>>> Then, you should combine cmdIdFromCoord and cmdIdFromDatanode into a >>>>>> single variable, like receivedCommandId. In this case, being at Coordinator >>>>>> or at Datanode means that this command ID has been received from a remode >>>>>> node. If you have other ideas for names please feel free! >>>>>> >>>>> >>>> Done, again I used functions to retain code readability. I however had >>>> to add checks inside the functions to make sure that the functions intended >>>> for data node work only for data nodes and the ones intended for >>>> coordinators work only for coordinators. >>>> >>>> >>>>> Also changing the name cmdIdRcvdAtDatanode to isCommandIdReceived >>>>>> might help... >>>>>> >>>>> >>>> Done >>>> >>>> >>>>> Just a question, what means the preffix enb? (just by curiosity) :) >>>>>> >>>>>> 5) In xact.c, at the end of CommitTransaction, PrepareTransaction, >>>>>> CleanupTransaction, you will need to change the cleanup of the variables if >>>>>> you take into account comment 4). >>>>>> >>>>> >>>> Done >>>> >>>> >>>>> 6) In BeginTransactionBlock, PrepareTransactionBlock, >>>>>> EndTransactionBlock and UserAbortTransactionBlock of xact.c, you need to >>>>>> set an additional !IsConnFromCoord() at this place, this will avoid remote >>>>>> Coordinators trying to send CommandIds to inexistent nodes. >>>>>> if(IS_PGXC_COORDINATOR) >>>>>> + SetCmdIdSending(false); >>>>>> >>>>> >>>> Done >>>> >>>> >>>>> >>>>>> 7) You should combine the messages 'Y' and 'M'. The content and the >>>>>> goals are the same. You can also eliminate the 2nd byte of 'Y' without >>>>>> consequences I think. >>>>>> >>>>> >>>> I did not combine these two. The intention of having them separate is >>>> because 'Y' should not be tied to 'M', In future we might need data nodes >>>> to notify coordinators of some thing else. And for the same reason I have >>>> added a second byte in 'Y' so that we can have the same command notify >>>> coordinators of some thing else. >>>> >>>> >>>>> 8) Why didn't you put the reception of message 'M' in PostgresMain of >>>>>> postgres.c at the same place as the reception of GXID, snapshot, timestamp, >>>>>> etc. >>>>>> Putting the reception of message at the end of SocketBackend makes >>>>>> the code kind of inconsistent. If you keep it inside SocketBackend, you >>>>>> should at least manage the message inside the switch, however I do not see >>>>>> any particular reason why you receive this message in SocketBackend and not >>>>>> in PostgresMain. So anything I should know? >>>>>> You should also remove the flag IS_PGXC_DATANODE when receiving the >>>>>> command ID. Remote Coordinator might need to be able to receive command IDs >>>>>> as well, even if there is no direct usage now, but for stuff like triggers?? >>>>>> >>>>> >>>> Done, I added handling of 'M' in PostgresMain. >>>> >>>> >>>>> 9) I found some diffs with the test select_views. It might be worth >>>>>> to look at what is happening. It is perhaps an environment-related issue. >>>>>> Btw, my regression diffs are attached. >>>>>> >>>>> >>>> I was also getting these diffs but it was difficult for me to think >>>> that the changes I did could cause a change in oder of rows. Here is what I >>>> did to dig the issue deeper. >>>> >>>> create table test_text(a text); >>>> >>>> insert into test_text values('Bancroft Ave'); >>>> insert into test_text values('Bancroft Ave'); >>>> insert into test_text values('Birch St'); >>>> insert into test_text values('Birch St'); >>>> insert into test_text values('Blacow Road'); >>>> insert into test_text values('Bridgepointe Dr'); >>>> insert into test_text values('Broadmore Ave'); >>>> insert into test_text values('Broadway '); >>>> insert into test_text values('B St'); >>>> >>>> select * from test_text order by 1 >>>> produces same results with plain PG and XC >>>> >>>> Also I changed the query in plain PG select_views test case to >>>> SELECT * FROM street ORDER BY name,cname,thepath::text; >>>> and obtained same results with plain PG and XC. >>>> >>>> I am not sure why the test case was passing earlier with the current >>>> possible expected outputs. >>>> >>>> Summarized design notes are attached with the mail for reference. >>>> >>>> 10) The modifications you are bringing to combocid_1.out are >>>>>> consistent with postgres, cool! >>>>>> >>>>>> No performance impact with this patch! Tested and approved. This is >>>>>> good. >>>>>> No regression impact with this patch. >>>>>> >>>>>> So, to conclude, I think it is a good piece of work, really cool >>>>>> stuff that I believe will be helpful for triggers and even other things. >>>>>> It just needs to be polished and simplified a bit. However basics are >>>>>> here, and performance is not impacted. Really it was worth spending time on >>>>>> that. The additional test cases also look to be sufficient, but I might be >>>>>> missing smth so feel free to add new tests if necessary. >>>>>> Thanks! >>>>>> >>>>>> -- >>>>>> Michael Paquier >>>>>> http://michael.otacoo.com >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>>> >>>>> Phone: 92-334-5100153 >>>>> >>>>> Website: www.enterprisedb.com >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>> >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>> >>> >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > Michael Paquier > http://michael.otacoo.com > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2012-07-23 05:14:18
|
Yeah OK. Btw, it should be better to set up the entire condition inside this loop, checking for parent/child INSERT SELECT condition on remote nodes doesn't make sense either. if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { target_rte = rt_fetch(qry->resultRelation, pstate->p_rtable); if (is_relation_child(target_rte, selectQuery->rtable)) { qry->is_ins_child_sel_parent = true; SetSendCommandId(true); } } On Mon, Jul 23, 2012 at 2:05 PM, Abbas Butt <abb...@en...>wrote: > Thanks Michael, I just reviewed the patch and found one small thing. > > In the function transformInsertStmt where we set > SetSendCommandId(true); > I think it would be better to put this statement under an > if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) > > Other than that I am OK with the changes. > > Regards > > > On Mon, Jul 23, 2012 at 7:08 AM, Michael Paquier < > mic...@gm...> wrote: > >> Hi, >> >> I spent some time this morning rewriting your patch. >> I reduced the number of functions and parameters to a minimum number and >> eliminated the useless message 'Y'. >> 'Y' definitely doesn't make sense as it has absolutely the same role as >> 'M'. In the former patch, the notion of notification is simply to send a >> message from Coordinator to Datanode, but a Datanode can also send a GXID, >> snapshot and timestamp using the same message types as the one used when a >> Coordinator sends data to a remote node. So we shouldn't lose this >> portability of the new messages. The only thing that changes for >> Coordinator->remoteNode and remoteCode->Coordinator communication protocol >> is the place where message is read. in the first case, message is received >> by remote node in postgres.c stuff. In the second case, message is received >> inside execRemote.c, so this distinction makes perfectly safe the fact of >> using the same message 'M', making 'Y' unnecessary. >> Also, it is really important to limit the number of used message types to >> reduce the footprint of XC code on PostgreSQL. >> >> There are no problems with regressions, and performance is not impacted. >> I also corrected some comment format and added some other comments at >> various places. >> In consequence, and to simplify the work of everybody, I am going to >> commit this largely simplified version of the patch. >> >> Thanks, >> >> >> On Sat, Jul 21, 2012 at 6:29 PM, Abbas Butt <abb...@en...>wrote: >> >>> >>> >>> On Fri, Jul 20, 2012 at 11:04 AM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> Hi, >>>> >>>> Sorry for being a little bit picky here: >>>> 1) In GetCurrentSubTransactionId of xact.c, you can replace: >>>> if (IS_PGXC_DATANODE && isCommandIdReceived) >>>> { >>>> foo >>>> } >>>> else >>>> { >>>> if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>>> re_foo >>>> } >>>> by: >>>> if (IS_PGXC_DATANODE && isCommandIdReceived) >>>> { >>>> foo >>>> } >>>> else if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>>> { >>>> re_foo >>>> } >>>> >>> >>> Done >>> >>> >>>> 2) I am not so much a fan of this 'Y' message as for now it just has a >>>> single usage, and M is doing exactly the same thing... But seen as a >>>> message that a remote node sends a notification to its backend, ok it makes >>>> some sense. In this case you should avoid to use a magic number like '1'. >>>> Please use a define like: >>>> #define DATANODE_NOTIFICATION_COMMAND_ID '1' >>>> A place like pgxc.h would make sense to define that. >>>> Then use this variable at the places where you need it >>>> (HandleDatanodeNotification of execRemote.c and ReportCommandIdChange of >>>> xact.c). It will bring more visibility to your code. >>>> >>> >>> Done >>> >>> >>>> 3) SetCmdIdSending and SetCmdIdReporting should be combined, they serve >>>> the same purpose. Their only difference is the node type where they are >>>> used. Let's make that generic. As a general name, SetCommandIdSending would >>>> be enough. >>>> >>> >>> I am not getting how to do that, consider the example of >>> StartTransaction function where we want to SetCmdIdReporting(false). >>> Suppose we combine both functions and have the new function like this >>> >>> void SetCmdIdSending(bool enb) >>> { >>> if(IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>> sendCommandId = enb; >>> else if(IS_PGXC_DATANODE) >>> sendCommandId = enb; >>> } >>> >>> and that we call this function instead of SetCmdIdReporting. It will >>> work OK for data node but when the same code executes for coordinator we >>> will set sendCommandId to false which was not required. Perhaps I did not >>> get your point. >>> >>> >>>> 4) GetCmdIdOfDatanode and GetCmdIdOfCoordinator should also be >>>> combined. They do exactly the same thing, the only difference is that one >>>> is launched on Datanode and the other on Coordinator. A general name like >>>> GetCommandIdReceived would be nicer. >>>> >>> >>> See my comment for point (3) >>> >>> 5) In portalcmds.c, PerformCursorOpen. You need to replace >>>> if (IS_PGXC_COORDINATOR) >>>> GetCurrentCommandId(true); >>>> by: >>>> if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>>> GetCurrentCommandId(true); >>>> >>> >>> Done. >>> >>> >>>> 6) For the regressions, I am still getting the diffs. Based on your >>>> test, I get this ordering for PG and XC: >>>> pgxc# select * from test_text order by 1; >>>> a >>>> ------------------------------------ >>>> Bancroft Ave >>>> Bancroft Ave >>>> Birch St >>>> Birch St >>>> Blacow Road >>>> Bridgepointe Dr >>>> Broadmore Ave >>>> Broadway >>>> B St >>>> (9 rows) >>>> If we are getting the same diffs we should change the output, it looks >>>> to be a side-effect of your fix for INSERT SELECT on parent-child, >>>> >>> >>> Done >>> >>> >>>> >>>> On Fri, Jul 20, 2012 at 2:15 AM, Abbas Butt < >>>> abb...@en...> wrote: >>>> >>>>> Attached please find the updated patch. >>>>> >>>>> On Thu, Jul 19, 2012 at 11:01 AM, Abbas Butt < >>>>> abb...@en...> wrote: >>>>> >>>>>> Thanks Michael for the review, Thanks Ashutosh for the comments, >>>>>> Michael I will accommodate your valuable feedback and will send >>>>>> revised patch. >>>>>> Ashutosh I will send the summarized design in the email with the >>>>>> revised patch. >>>>>> BTW enb is short for enable! >>>>>> >>>>>> >>>>>> On Thu, Jul 19, 2012 at 10:37 AM, Michael Paquier < >>>>>> mic...@gm...> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Finally I took some time to look at this code, it is important >>>>>>> support. >>>>>>> 1) The code is not realigned with current master (xc_misc.sql) and >>>>>>> contains whitespaces (2 in xc_misc.out, perhaps other files as well). So I >>>>>>> realigned it myself to test the code. >>>>>>> >>>>>> >>>>> The attached patch is aligned with the current master and I have >>>>> removed white spaces. >>>>> >>>>> >>>>>> 2) In GetCurrentCommandId xact.c, you need to set !IsConnFromCoord() >>>>>>> with (IS_PGXC_COORDINATOR) to avoid remote coordinators incrementing the >>>>>>> command ID. Remote Coordinators do not receive any command IDs from remote >>>>>>> nodes. Also here using an if/else if structure won't hurt and bring clarity >>>>>>> in the operations. >>>>>>> >>>>>> >>>>> Added the check and used if/else >>>>> >>>>> >>>>>> 3) In BeginTransactionBlock of xact.c, you need to add >>>>>>> !IsConnFromCoord() as a secondary condition to avoid remote Coordinator >>>>>>> problems. >>>>>>> >>>>>> >>>>> Done >>>>> >>>>> >>>>>> 4) By looking at the patch It is possible to reduce the number of >>>>>>> variables used for the control of Command ID. In the current version of >>>>>>> your patch, you are using 5 static variables: >>>>>>> static CommandId cmdIdFromCoord; /* Datanode will use >>>>>>> command id received from coordinator */ >>>>>>> static bool cmdIdRcvdAtDatanode; /* Has the coordinator >>>>>>> sent command id to the data node? */ >>>>>>> static bool enbCmdIdChgReporting; /* Should datanode >>>>>>> report command id changes to coordinator? */ >>>>>>> static bool enbCoordCmdIdSending; /* Should coordinator >>>>>>> send command id to the data node? */ >>>>>>> static CommandId cmdIdFromDatanode; /* Coordinator will >>>>>>> receive updated command id from datanode */ >>>>>>> However, cmdIdRcvdAtDatanode, cmdIdFromCoord and >>>>>>> enbCmdIdChgReporting are used only at Datanode. >>>>>>> cmdIdFromDatanode and enbCoordCmdIdSending are only used at >>>>>>> Coordinator level. >>>>>>> You should combine enbCoordCmdIdSending and enbCmdIdChgReporting >>>>>>> into a single variable with a generic name, like let's say sendCommandId or >>>>>>> reportCommandId. Specifying in comments the role of this unique variable >>>>>>> for local Coordinator and remote nodes will help. Those 2 variables play >>>>>>> the same role which is to control if a >>>>>>> In the case of Coordinator, this variable is flagged to true when >>>>>>> transforming INSERT statements (transformInsertStmt) and when a transaction >>>>>>> block is begun (BeginTransactionBlock). >>>>>>> In the case of a Datanode, this variable is flagged to true only >>>>>>> when a command ID has been received from Coordinator. >>>>>>> >>>>>>> >>>>> Done, I used functions to retain code readability and inside the >>>>> functions I used the same variables as u suggested. >>>>> >>>>> >>>>>> Then, you should combine cmdIdFromCoord and cmdIdFromDatanode into a >>>>>>> single variable, like receivedCommandId. In this case, being at Coordinator >>>>>>> or at Datanode means that this command ID has been received from a remode >>>>>>> node. If you have other ideas for names please feel free! >>>>>>> >>>>>> >>>>> Done, again I used functions to retain code readability. I however had >>>>> to add checks inside the functions to make sure that the functions intended >>>>> for data node work only for data nodes and the ones intended for >>>>> coordinators work only for coordinators. >>>>> >>>>> >>>>>> Also changing the name cmdIdRcvdAtDatanode to isCommandIdReceived >>>>>>> might help... >>>>>>> >>>>>> >>>>> Done >>>>> >>>>> >>>>>> Just a question, what means the preffix enb? (just by curiosity) :) >>>>>>> >>>>>>> 5) In xact.c, at the end of CommitTransaction, PrepareTransaction, >>>>>>> CleanupTransaction, you will need to change the cleanup of the variables if >>>>>>> you take into account comment 4). >>>>>>> >>>>>> >>>>> Done >>>>> >>>>> >>>>>> 6) In BeginTransactionBlock, PrepareTransactionBlock, >>>>>>> EndTransactionBlock and UserAbortTransactionBlock of xact.c, you need to >>>>>>> set an additional !IsConnFromCoord() at this place, this will avoid remote >>>>>>> Coordinators trying to send CommandIds to inexistent nodes. >>>>>>> if(IS_PGXC_COORDINATOR) >>>>>>> + SetCmdIdSending(false); >>>>>>> >>>>>> >>>>> Done >>>>> >>>>> >>>>>> >>>>>>> 7) You should combine the messages 'Y' and 'M'. The content and the >>>>>>> goals are the same. You can also eliminate the 2nd byte of 'Y' without >>>>>>> consequences I think. >>>>>>> >>>>>> >>>>> I did not combine these two. The intention of having them separate is >>>>> because 'Y' should not be tied to 'M', In future we might need data nodes >>>>> to notify coordinators of some thing else. And for the same reason I have >>>>> added a second byte in 'Y' so that we can have the same command notify >>>>> coordinators of some thing else. >>>>> >>>>> >>>>>> 8) Why didn't you put the reception of message 'M' in PostgresMain of >>>>>>> postgres.c at the same place as the reception of GXID, snapshot, timestamp, >>>>>>> etc. >>>>>>> Putting the reception of message at the end of SocketBackend makes >>>>>>> the code kind of inconsistent. If you keep it inside SocketBackend, you >>>>>>> should at least manage the message inside the switch, however I do not see >>>>>>> any particular reason why you receive this message in SocketBackend and not >>>>>>> in PostgresMain. So anything I should know? >>>>>>> You should also remove the flag IS_PGXC_DATANODE when receiving the >>>>>>> command ID. Remote Coordinator might need to be able to receive command IDs >>>>>>> as well, even if there is no direct usage now, but for stuff like triggers?? >>>>>>> >>>>>> >>>>> Done, I added handling of 'M' in PostgresMain. >>>>> >>>>> >>>>>> 9) I found some diffs with the test select_views. It might be worth >>>>>>> to look at what is happening. It is perhaps an environment-related issue. >>>>>>> Btw, my regression diffs are attached. >>>>>>> >>>>>> >>>>> I was also getting these diffs but it was difficult for me to think >>>>> that the changes I did could cause a change in oder of rows. Here is what I >>>>> did to dig the issue deeper. >>>>> >>>>> create table test_text(a text); >>>>> >>>>> insert into test_text values('Bancroft Ave'); >>>>> insert into test_text values('Bancroft Ave'); >>>>> insert into test_text values('Birch St'); >>>>> insert into test_text values('Birch St'); >>>>> insert into test_text values('Blacow Road'); >>>>> insert into test_text values('Bridgepointe Dr'); >>>>> insert into test_text values('Broadmore Ave'); >>>>> insert into test_text values('Broadway '); >>>>> insert into test_text values('B St'); >>>>> >>>>> select * from test_text order by 1 >>>>> produces same results with plain PG and XC >>>>> >>>>> Also I changed the query in plain PG select_views test case to >>>>> SELECT * FROM street ORDER BY name,cname,thepath::text; >>>>> and obtained same results with plain PG and XC. >>>>> >>>>> I am not sure why the test case was passing earlier with the current >>>>> possible expected outputs. >>>>> >>>>> Summarized design notes are attached with the mail for reference. >>>>> >>>>> 10) The modifications you are bringing to combocid_1.out are >>>>>>> consistent with postgres, cool! >>>>>>> >>>>>>> No performance impact with this patch! Tested and approved. This is >>>>>>> good. >>>>>>> No regression impact with this patch. >>>>>>> >>>>>>> So, to conclude, I think it is a good piece of work, really cool >>>>>>> stuff that I believe will be helpful for triggers and even other things. >>>>>>> It just needs to be polished and simplified a bit. However basics >>>>>>> are here, and performance is not impacted. Really it was worth spending >>>>>>> time on that. The additional test cases also look to be sufficient, but I >>>>>>> might be missing smth so feel free to add new tests if necessary. >>>>>>> Thanks! >>>>>>> >>>>>>> -- >>>>>>> Michael Paquier >>>>>>> http://michael.otacoo.com >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Abbas >>>>>> Architect >>>>>> EnterpriseDB Corporation >>>>>> The Enterprise PostgreSQL Company >>>>>> >>>>>> Phone: 92-334-5100153 >>>>>> >>>>>> Website: www.enterprisedb.com >>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>> >>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>> the individual or entity to whom it is addressed. This message >>>>>> contains information from EnterpriseDB Corporation that may be >>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>> law. If you are not the intended recipient or authorized to receive >>>>>> this for the intended recipient, any use, dissemination, distribution, >>>>>> retention, archiving, or copying of this communication is strictly >>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>> the sender immediately by reply e-mail and delete this message. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>>> >>>>> Phone: 92-334-5100153 >>>>> >>>>> Website: www.enterprisedb.com >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>> >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>>> >>>> >>>> >>>> -- >>>> Michael Paquier >>>> http://michael.otacoo.com >>>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >> >> >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@en...> - 2012-07-23 05:17:35
|
Agreed. Please go ahead and check in this fix. On Mon, Jul 23, 2012 at 10:14 AM, Michael Paquier <mic...@gm... > wrote: > Yeah OK. Btw, it should be better to set up the entire condition inside > this loop, checking for parent/child INSERT SELECT condition on remote > nodes doesn't make sense either. > if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) > { > target_rte = rt_fetch(qry->resultRelation, pstate->p_rtable); > if (is_relation_child(target_rte, selectQuery->rtable)) > { > qry->is_ins_child_sel_parent = true; > SetSendCommandId(true); > > } > } > > On Mon, Jul 23, 2012 at 2:05 PM, Abbas Butt <abb...@en...>wrote: > >> Thanks Michael, I just reviewed the patch and found one small thing. >> >> In the function transformInsertStmt where we set >> SetSendCommandId(true); >> I think it would be better to put this statement under an >> if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >> >> Other than that I am OK with the changes. >> >> Regards >> >> >> On Mon, Jul 23, 2012 at 7:08 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi, >>> >>> I spent some time this morning rewriting your patch. >>> I reduced the number of functions and parameters to a minimum number and >>> eliminated the useless message 'Y'. >>> 'Y' definitely doesn't make sense as it has absolutely the same role as >>> 'M'. In the former patch, the notion of notification is simply to send a >>> message from Coordinator to Datanode, but a Datanode can also send a GXID, >>> snapshot and timestamp using the same message types as the one used when a >>> Coordinator sends data to a remote node. So we shouldn't lose this >>> portability of the new messages. The only thing that changes for >>> Coordinator->remoteNode and remoteCode->Coordinator communication protocol >>> is the place where message is read. in the first case, message is received >>> by remote node in postgres.c stuff. In the second case, message is received >>> inside execRemote.c, so this distinction makes perfectly safe the fact of >>> using the same message 'M', making 'Y' unnecessary. >>> Also, it is really important to limit the number of used message types >>> to reduce the footprint of XC code on PostgreSQL. >>> >>> There are no problems with regressions, and performance is not impacted. >>> I also corrected some comment format and added some other comments at >>> various places. >>> In consequence, and to simplify the work of everybody, I am going to >>> commit this largely simplified version of the patch. >>> >>> Thanks, >>> >>> >>> On Sat, Jul 21, 2012 at 6:29 PM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> >>>> >>>> On Fri, Jul 20, 2012 at 11:04 AM, Michael Paquier < >>>> mic...@gm...> wrote: >>>> >>>>> Hi, >>>>> >>>>> Sorry for being a little bit picky here: >>>>> 1) In GetCurrentSubTransactionId of xact.c, you can replace: >>>>> if (IS_PGXC_DATANODE && isCommandIdReceived) >>>>> { >>>>> foo >>>>> } >>>>> else >>>>> { >>>>> if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>>>> re_foo >>>>> } >>>>> by: >>>>> if (IS_PGXC_DATANODE && isCommandIdReceived) >>>>> { >>>>> foo >>>>> } >>>>> else if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>>>> { >>>>> re_foo >>>>> } >>>>> >>>> >>>> Done >>>> >>>> >>>>> 2) I am not so much a fan of this 'Y' message as for now it just has a >>>>> single usage, and M is doing exactly the same thing... But seen as a >>>>> message that a remote node sends a notification to its backend, ok it makes >>>>> some sense. In this case you should avoid to use a magic number like '1'. >>>>> Please use a define like: >>>>> #define DATANODE_NOTIFICATION_COMMAND_ID '1' >>>>> A place like pgxc.h would make sense to define that. >>>>> Then use this variable at the places where you need it >>>>> (HandleDatanodeNotification of execRemote.c and ReportCommandIdChange of >>>>> xact.c). It will bring more visibility to your code. >>>>> >>>> >>>> Done >>>> >>>> >>>>> 3) SetCmdIdSending and SetCmdIdReporting should be combined, they >>>>> serve the same purpose. Their only difference is the node type where they >>>>> are used. Let's make that generic. As a general name, SetCommandIdSending >>>>> would be enough. >>>>> >>>> >>>> I am not getting how to do that, consider the example of >>>> StartTransaction function where we want to SetCmdIdReporting(false). >>>> Suppose we combine both functions and have the new function like this >>>> >>>> void SetCmdIdSending(bool enb) >>>> { >>>> if(IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>>> sendCommandId = enb; >>>> else if(IS_PGXC_DATANODE) >>>> sendCommandId = enb; >>>> } >>>> >>>> and that we call this function instead of SetCmdIdReporting. It will >>>> work OK for data node but when the same code executes for coordinator we >>>> will set sendCommandId to false which was not required. Perhaps I did not >>>> get your point. >>>> >>>> >>>>> 4) GetCmdIdOfDatanode and GetCmdIdOfCoordinator should also be >>>>> combined. They do exactly the same thing, the only difference is that one >>>>> is launched on Datanode and the other on Coordinator. A general name like >>>>> GetCommandIdReceived would be nicer. >>>>> >>>> >>>> See my comment for point (3) >>>> >>>> 5) In portalcmds.c, PerformCursorOpen. You need to replace >>>>> if (IS_PGXC_COORDINATOR) >>>>> GetCurrentCommandId(true); >>>>> by: >>>>> if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) >>>>> GetCurrentCommandId(true); >>>>> >>>> >>>> Done. >>>> >>>> >>>>> 6) For the regressions, I am still getting the diffs. Based on your >>>>> test, I get this ordering for PG and XC: >>>>> pgxc# select * from test_text order by 1; >>>>> a >>>>> ------------------------------------ >>>>> Bancroft Ave >>>>> Bancroft Ave >>>>> Birch St >>>>> Birch St >>>>> Blacow Road >>>>> Bridgepointe Dr >>>>> Broadmore Ave >>>>> Broadway >>>>> B St >>>>> (9 rows) >>>>> If we are getting the same diffs we should change the output, it looks >>>>> to be a side-effect of your fix for INSERT SELECT on parent-child, >>>>> >>>> >>>> Done >>>> >>>> >>>>> >>>>> On Fri, Jul 20, 2012 at 2:15 AM, Abbas Butt < >>>>> abb...@en...> wrote: >>>>> >>>>>> Attached please find the updated patch. >>>>>> >>>>>> On Thu, Jul 19, 2012 at 11:01 AM, Abbas Butt < >>>>>> abb...@en...> wrote: >>>>>> >>>>>>> Thanks Michael for the review, Thanks Ashutosh for the comments, >>>>>>> Michael I will accommodate your valuable feedback and will send >>>>>>> revised patch. >>>>>>> Ashutosh I will send the summarized design in the email with the >>>>>>> revised patch. >>>>>>> BTW enb is short for enable! >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 19, 2012 at 10:37 AM, Michael Paquier < >>>>>>> mic...@gm...> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Finally I took some time to look at this code, it is important >>>>>>>> support. >>>>>>>> 1) The code is not realigned with current master (xc_misc.sql) and >>>>>>>> contains whitespaces (2 in xc_misc.out, perhaps other files as well). So I >>>>>>>> realigned it myself to test the code. >>>>>>>> >>>>>>> >>>>>> The attached patch is aligned with the current master and I have >>>>>> removed white spaces. >>>>>> >>>>>> >>>>>>> 2) In GetCurrentCommandId xact.c, you need to set >>>>>>>> !IsConnFromCoord() with (IS_PGXC_COORDINATOR) to avoid remote coordinators >>>>>>>> incrementing the command ID. Remote Coordinators do not receive any command >>>>>>>> IDs from remote nodes. Also here using an if/else if structure won't hurt >>>>>>>> and bring clarity in the operations. >>>>>>>> >>>>>>> >>>>>> Added the check and used if/else >>>>>> >>>>>> >>>>>>> 3) In BeginTransactionBlock of xact.c, you need to add >>>>>>>> !IsConnFromCoord() as a secondary condition to avoid remote Coordinator >>>>>>>> problems. >>>>>>>> >>>>>>> >>>>>> Done >>>>>> >>>>>> >>>>>>> 4) By looking at the patch It is possible to reduce the number of >>>>>>>> variables used for the control of Command ID. In the current version of >>>>>>>> your patch, you are using 5 static variables: >>>>>>>> static CommandId cmdIdFromCoord; /* Datanode will use >>>>>>>> command id received from coordinator */ >>>>>>>> static bool cmdIdRcvdAtDatanode; /* Has the coordinator >>>>>>>> sent command id to the data node? */ >>>>>>>> static bool enbCmdIdChgReporting; /* Should datanode >>>>>>>> report command id changes to coordinator? */ >>>>>>>> static bool enbCoordCmdIdSending; /* Should coordinator >>>>>>>> send command id to the data node? */ >>>>>>>> static CommandId cmdIdFromDatanode; /* Coordinator will >>>>>>>> receive updated command id from datanode */ >>>>>>>> However, cmdIdRcvdAtDatanode, cmdIdFromCoord and >>>>>>>> enbCmdIdChgReporting are used only at Datanode. >>>>>>>> cmdIdFromDatanode and enbCoordCmdIdSending are only used at >>>>>>>> Coordinator level. >>>>>>>> You should combine enbCoordCmdIdSending and enbCmdIdChgReporting >>>>>>>> into a single variable with a generic name, like let's say sendCommandId or >>>>>>>> reportCommandId. Specifying in comments the role of this unique variable >>>>>>>> for local Coordinator and remote nodes will help. Those 2 variables play >>>>>>>> the same role which is to control if a >>>>>>>> In the case of Coordinator, this variable is flagged to true when >>>>>>>> transforming INSERT statements (transformInsertStmt) and when a transaction >>>>>>>> block is begun (BeginTransactionBlock). >>>>>>>> In the case of a Datanode, this variable is flagged to true only >>>>>>>> when a command ID has been received from Coordinator. >>>>>>>> >>>>>>>> >>>>>> Done, I used functions to retain code readability and inside the >>>>>> functions I used the same variables as u suggested. >>>>>> >>>>>> >>>>>>> Then, you should combine cmdIdFromCoord and cmdIdFromDatanode into a >>>>>>>> single variable, like receivedCommandId. In this case, being at Coordinator >>>>>>>> or at Datanode means that this command ID has been received from a remode >>>>>>>> node. If you have other ideas for names please feel free! >>>>>>>> >>>>>>> >>>>>> Done, again I used functions to retain code readability. I however >>>>>> had to add checks inside the functions to make sure that the functions >>>>>> intended for data node work only for data nodes and the ones intended for >>>>>> coordinators work only for coordinators. >>>>>> >>>>>> >>>>>>> Also changing the name cmdIdRcvdAtDatanode to isCommandIdReceived >>>>>>>> might help... >>>>>>>> >>>>>>> >>>>>> Done >>>>>> >>>>>> >>>>>>> Just a question, what means the preffix enb? (just by curiosity) :) >>>>>>>> >>>>>>>> 5) In xact.c, at the end of CommitTransaction, PrepareTransaction, >>>>>>>> CleanupTransaction, you will need to change the cleanup of the variables if >>>>>>>> you take into account comment 4). >>>>>>>> >>>>>>> >>>>>> Done >>>>>> >>>>>> >>>>>>> 6) In BeginTransactionBlock, PrepareTransactionBlock, >>>>>>>> EndTransactionBlock and UserAbortTransactionBlock of xact.c, you need to >>>>>>>> set an additional !IsConnFromCoord() at this place, this will avoid remote >>>>>>>> Coordinators trying to send CommandIds to inexistent nodes. >>>>>>>> if(IS_PGXC_COORDINATOR) >>>>>>>> + SetCmdIdSending(false); >>>>>>>> >>>>>>> >>>>>> Done >>>>>> >>>>>> >>>>>>> >>>>>>>> 7) You should combine the messages 'Y' and 'M'. The content and the >>>>>>>> goals are the same. You can also eliminate the 2nd byte of 'Y' without >>>>>>>> consequences I think. >>>>>>>> >>>>>>> >>>>>> I did not combine these two. The intention of having them separate is >>>>>> because 'Y' should not be tied to 'M', In future we might need data nodes >>>>>> to notify coordinators of some thing else. And for the same reason I have >>>>>> added a second byte in 'Y' so that we can have the same command notify >>>>>> coordinators of some thing else. >>>>>> >>>>>> >>>>>>> 8) Why didn't you put the reception of message 'M' in PostgresMain >>>>>>>> of postgres.c at the same place as the reception of GXID, snapshot, >>>>>>>> timestamp, etc. >>>>>>>> Putting the reception of message at the end of SocketBackend makes >>>>>>>> the code kind of inconsistent. If you keep it inside SocketBackend, you >>>>>>>> should at least manage the message inside the switch, however I do not see >>>>>>>> any particular reason why you receive this message in SocketBackend and not >>>>>>>> in PostgresMain. So anything I should know? >>>>>>>> You should also remove the flag IS_PGXC_DATANODE when receiving the >>>>>>>> command ID. Remote Coordinator might need to be able to receive command IDs >>>>>>>> as well, even if there is no direct usage now, but for stuff like triggers?? >>>>>>>> >>>>>>> >>>>>> Done, I added handling of 'M' in PostgresMain. >>>>>> >>>>>> >>>>>>> 9) I found some diffs with the test select_views. It might be worth >>>>>>>> to look at what is happening. It is perhaps an environment-related issue. >>>>>>>> Btw, my regression diffs are attached. >>>>>>>> >>>>>>> >>>>>> I was also getting these diffs but it was difficult for me to think >>>>>> that the changes I did could cause a change in oder of rows. Here is what I >>>>>> did to dig the issue deeper. >>>>>> >>>>>> create table test_text(a text); >>>>>> >>>>>> insert into test_text values('Bancroft Ave'); >>>>>> insert into test_text values('Bancroft Ave'); >>>>>> insert into test_text values('Birch St'); >>>>>> insert into test_text values('Birch St'); >>>>>> insert into test_text values('Blacow Road'); >>>>>> insert into test_text values('Bridgepointe Dr'); >>>>>> insert into test_text values('Broadmore Ave'); >>>>>> insert into test_text values('Broadway '); >>>>>> insert into test_text values('B St'); >>>>>> >>>>>> select * from test_text order by 1 >>>>>> produces same results with plain PG and XC >>>>>> >>>>>> Also I changed the query in plain PG select_views test case to >>>>>> SELECT * FROM street ORDER BY name,cname,thepath::text; >>>>>> and obtained same results with plain PG and XC. >>>>>> >>>>>> I am not sure why the test case was passing earlier with the current >>>>>> possible expected outputs. >>>>>> >>>>>> Summarized design notes are attached with the mail for reference. >>>>>> >>>>>> 10) The modifications you are bringing to combocid_1.out are >>>>>>>> consistent with postgres, cool! >>>>>>>> >>>>>>>> No performance impact with this patch! Tested and approved. This is >>>>>>>> good. >>>>>>>> No regression impact with this patch. >>>>>>>> >>>>>>>> So, to conclude, I think it is a good piece of work, really cool >>>>>>>> stuff that I believe will be helpful for triggers and even other things. >>>>>>>> It just needs to be polished and simplified a bit. However basics >>>>>>>> are here, and performance is not impacted. Really it was worth spending >>>>>>>> time on that. The additional test cases also look to be sufficient, but I >>>>>>>> might be missing smth so feel free to add new tests if necessary. >>>>>>>> Thanks! >>>>>>>> >>>>>>>> -- >>>>>>>> Michael Paquier >>>>>>>> http://michael.otacoo.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> Abbas >>>>>>> Architect >>>>>>> EnterpriseDB Corporation >>>>>>> The Enterprise PostgreSQL Company >>>>>>> >>>>>>> Phone: 92-334-5100153 >>>>>>> >>>>>>> Website: www.enterprisedb.com >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>>> >>>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>>> the individual or entity to whom it is addressed. This message >>>>>>> contains information from EnterpriseDB Corporation that may be >>>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>>> law. If you are not the intended recipient or authorized to receive >>>>>>> this for the intended recipient, any use, dissemination, >>>>>>> distribution, >>>>>>> retention, archiving, or copying of this communication is strictly >>>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>>> the sender immediately by reply e-mail and delete this message. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Abbas >>>>>> Architect >>>>>> EnterpriseDB Corporation >>>>>> The Enterprise PostgreSQL Company >>>>>> >>>>>> Phone: 92-334-5100153 >>>>>> >>>>>> Website: www.enterprisedb.com >>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>> >>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>> the individual or entity to whom it is addressed. This message >>>>>> contains information from EnterpriseDB Corporation that may be >>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>> law. If you are not the intended recipient or authorized to receive >>>>>> this for the intended recipient, any use, dissemination, distribution, >>>>>> retention, archiving, or copying of this communication is strictly >>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>> the sender immediately by reply e-mail and delete this message. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Michael Paquier >>>>> http://michael.otacoo.com >>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>> >>> >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > Michael Paquier > http://michael.otacoo.com > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2012-07-23 05:33:42
|
On Mon, Jul 23, 2012 at 2:17 PM, Abbas Butt <abb...@en...>wrote: > Agreed. Please go ahead and check in this fix. Thanks, I fixed that and also another issue I found on the way. -- Michael Paquier http://michael.otacoo.com |