You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
1
|
2
(1) |
3
(6) |
4
(19) |
5
|
6
(15) |
7
(2) |
8
(2) |
9
(22) |
10
(20) |
11
(20) |
12
(14) |
13
(12) |
14
(2) |
15
|
16
(14) |
17
(17) |
18
(4) |
19
(8) |
20
(2) |
21
(3) |
22
|
23
(8) |
24
(1) |
25
|
26
(2) |
27
(1) |
28
|
29
|
30
(7) |
31
(3) |
|
|
|
|
From: Michael P. <mic...@gm...> - 2012-07-20 06:04:13
|
Hi, Sorry for being a little bit picky here: 1) In GetCurrentSubTransactionId of xact.c, you can replace: if (IS_PGXC_DATANODE && isCommandIdReceived) { foo } else { if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) re_foo } by: if (IS_PGXC_DATANODE && isCommandIdReceived) { foo } else if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { re_foo } 2) I am not so much a fan of this 'Y' message as for now it just has a single usage, and M is doing exactly the same thing... But seen as a message that a remote node sends a notification to its backend, ok it makes some sense. In this case you should avoid to use a magic number like '1'. Please use a define like: #define DATANODE_NOTIFICATION_COMMAND_ID '1' A place like pgxc.h would make sense to define that. Then use this variable at the places where you need it (HandleDatanodeNotification of execRemote.c and ReportCommandIdChange of xact.c). It will bring more visibility to your code. 3) SetCmdIdSending and SetCmdIdReporting should be combined, they serve the same purpose. Their only difference is the node type where they are used. Let's make that generic. As a general name, SetCommandIdSending would be enough. 4) GetCmdIdOfDatanode and GetCmdIdOfCoordinator should also be combined. They do exactly the same thing, the only difference is that one is launched on Datanode and the other on Coordinator. A general name like GetCommandIdReceived would be nicer. 5) In portalcmds.c, PerformCursorOpen. You need to replace if (IS_PGXC_COORDINATOR) GetCurrentCommandId(true); by: if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) GetCurrentCommandId(true); 6) For the regressions, I am still getting the diffs. Based on your test, I get this ordering for PG and XC: pgxc# select * from test_text order by 1; a ------------------------------------ Bancroft Ave Bancroft Ave Birch St Birch St Blacow Road Bridgepointe Dr Broadmore Ave Broadway B St (9 rows) If we are getting the same diffs we should change the output, it looks to be a side-effect of your fix for INSERT SELECT on parent-child, On Fri, Jul 20, 2012 at 2:15 AM, Abbas Butt <abb...@en...>wrote: > Attached please find the updated patch. > > On Thu, Jul 19, 2012 at 11:01 AM, Abbas Butt <abb...@en...>wrote: > >> Thanks Michael for the review, Thanks Ashutosh for the comments, >> Michael I will accommodate your valuable feedback and will send revised >> patch. >> Ashutosh I will send the summarized design in the email with the revised >> patch. >> BTW enb is short for enable! >> >> >> On Thu, Jul 19, 2012 at 10:37 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi, >>> >>> Finally I took some time to look at this code, it is important support. >>> 1) The code is not realigned with current master (xc_misc.sql) and >>> contains whitespaces (2 in xc_misc.out, perhaps other files as well). So I >>> realigned it myself to test the code. >>> >> > The attached patch is aligned with the current master and I have removed > white spaces. > > >> 2) In GetCurrentCommandId xact.c, you need to set !IsConnFromCoord() >>> with (IS_PGXC_COORDINATOR) to avoid remote coordinators incrementing the >>> command ID. Remote Coordinators do not receive any command IDs from remote >>> nodes. Also here using an if/else if structure won't hurt and bring clarity >>> in the operations. >>> >> > Added the check and used if/else > > >> 3) In BeginTransactionBlock of xact.c, you need to add >>> !IsConnFromCoord() as a secondary condition to avoid remote Coordinator >>> problems. >>> >> > Done > > >> 4) By looking at the patch It is possible to reduce the number of >>> variables used for the control of Command ID. In the current version of >>> your patch, you are using 5 static variables: >>> static CommandId cmdIdFromCoord; /* Datanode will use command >>> id received from coordinator */ >>> static bool cmdIdRcvdAtDatanode; /* Has the coordinator sent >>> command id to the data node? */ >>> static bool enbCmdIdChgReporting; /* Should datanode report >>> command id changes to coordinator? */ >>> static bool enbCoordCmdIdSending; /* Should coordinator send >>> command id to the data node? */ >>> static CommandId cmdIdFromDatanode; /* Coordinator will receive >>> updated command id from datanode */ >>> However, cmdIdRcvdAtDatanode, cmdIdFromCoord and enbCmdIdChgReporting >>> are used only at Datanode. >>> cmdIdFromDatanode and enbCoordCmdIdSending are only used at Coordinator >>> level. >>> You should combine enbCoordCmdIdSending and enbCmdIdChgReporting into a >>> single variable with a generic name, like let's say sendCommandId or >>> reportCommandId. Specifying in comments the role of this unique variable >>> for local Coordinator and remote nodes will help. Those 2 variables play >>> the same role which is to control if a >>> In the case of Coordinator, this variable is flagged to true when >>> transforming INSERT statements (transformInsertStmt) and when a transaction >>> block is begun (BeginTransactionBlock). >>> In the case of a Datanode, this variable is flagged to true only when a >>> command ID has been received from Coordinator. >>> >>> > Done, I used functions to retain code readability and inside the functions > I used the same variables as u suggested. > > >> Then, you should combine cmdIdFromCoord and cmdIdFromDatanode into a >>> single variable, like receivedCommandId. In this case, being at Coordinator >>> or at Datanode means that this command ID has been received from a remode >>> node. If you have other ideas for names please feel free! >>> >> > Done, again I used functions to retain code readability. I however had to > add checks inside the functions to make sure that the functions intended > for data node work only for data nodes and the ones intended for > coordinators work only for coordinators. > > >> Also changing the name cmdIdRcvdAtDatanode to isCommandIdReceived might >>> help... >>> >> > Done > > >> Just a question, what means the preffix enb? (just by curiosity) :) >>> >>> 5) In xact.c, at the end of CommitTransaction, PrepareTransaction, >>> CleanupTransaction, you will need to change the cleanup of the variables if >>> you take into account comment 4). >>> >> > Done > > >> 6) In BeginTransactionBlock, PrepareTransactionBlock, EndTransactionBlock >>> and UserAbortTransactionBlock of xact.c, you need to set an additional >>> !IsConnFromCoord() at this place, this will avoid remote Coordinators >>> trying to send CommandIds to inexistent nodes. >>> if(IS_PGXC_COORDINATOR) >>> + SetCmdIdSending(false); >>> >> > Done > > >> >>> 7) You should combine the messages 'Y' and 'M'. The content and the >>> goals are the same. You can also eliminate the 2nd byte of 'Y' without >>> consequences I think. >>> >> > I did not combine these two. The intention of having them separate is > because 'Y' should not be tied to 'M', In future we might need data nodes > to notify coordinators of some thing else. And for the same reason I have > added a second byte in 'Y' so that we can have the same command notify > coordinators of some thing else. > > >> 8) Why didn't you put the reception of message 'M' in PostgresMain of >>> postgres.c at the same place as the reception of GXID, snapshot, timestamp, >>> etc. >>> Putting the reception of message at the end of SocketBackend makes the >>> code kind of inconsistent. If you keep it inside SocketBackend, you should >>> at least manage the message inside the switch, however I do not see any >>> particular reason why you receive this message in SocketBackend and not in >>> PostgresMain. So anything I should know? >>> You should also remove the flag IS_PGXC_DATANODE when receiving the >>> command ID. Remote Coordinator might need to be able to receive command IDs >>> as well, even if there is no direct usage now, but for stuff like triggers?? >>> >> > Done, I added handling of 'M' in PostgresMain. > > >> 9) I found some diffs with the test select_views. It might be worth to >>> look at what is happening. It is perhaps an environment-related issue. Btw, >>> my regression diffs are attached. >>> >> > I was also getting these diffs but it was difficult for me to think that > the changes I did could cause a change in oder of rows. Here is what I did > to dig the issue deeper. > > create table test_text(a text); > > insert into test_text values('Bancroft Ave'); > insert into test_text values('Bancroft Ave'); > insert into test_text values('Birch St'); > insert into test_text values('Birch St'); > insert into test_text values('Blacow Road'); > insert into test_text values('Bridgepointe Dr'); > insert into test_text values('Broadmore Ave'); > insert into test_text values('Broadway '); > insert into test_text values('B St'); > > select * from test_text order by 1 > produces same results with plain PG and XC > > Also I changed the query in plain PG select_views test case to > SELECT * FROM street ORDER BY name,cname,thepath::text; > and obtained same results with plain PG and XC. > > I am not sure why the test case was passing earlier with the current > possible expected outputs. > > Summarized design notes are attached with the mail for reference. > > 10) The modifications you are bringing to combocid_1.out are consistent >>> with postgres, cool! >>> >>> No performance impact with this patch! Tested and approved. This is good. >>> No regression impact with this patch. >>> >>> So, to conclude, I think it is a good piece of work, really cool stuff >>> that I believe will be helpful for triggers and even other things. >>> It just needs to be polished and simplified a bit. However basics are >>> here, and performance is not impacted. Really it was worth spending time on >>> that. The additional test cases also look to be sufficient, but I might be >>> missing smth so feel free to add new tests if necessary. >>> Thanks! >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@en...> - 2012-07-19 17:16:07
|
-------------------- 1. Background -------------------- In a multi-statement transaction each statement is given a command identifier starting from zero and incrementing for each statement. These command indentifers are required for extra tracking because each statement has its own visibility rules with in the transaction. For example, a cursor’s contents must remain unchanged even if later statements in the same transaction modify rows. Such tracking is implemented using system command id columns cmin/cmax, which is internally actually is a single column. cmin/cmax come into play in case of multi-statement transactions only, they are both zero otherwise. cmin "The command identifier of the statement within the inserting transaction." cmax "The command identifier of the statement within the deleting transaction." Here are the visibility rules (taken from comments of tqual.c) ( // A heap tuple is valid "now" iff Xmin == my-transaction && // inserted by the current transaction Cmin < my-command && // before this command, and ( Xmax is null || // the row has not been deleted, or ( Xmax == my-transaction && // it was deleted by the current transaction Cmax >= my-command // but not before this command, ) ) ) || // or ( Xmin is committed && // the row was inserted by a committed transaction, and ( Xmax is null || // the row has not been deleted, or ( Xmax == my-transaction && // the row is being deleted by this transaction Cmax >= my-command) || // but it's not deleted "yet", or ( Xmax != my-transaction && // the row was deleted by another transaction Xmax is not committed // that has not been committed ) ) ) ) Because cmin and cmax are internally a single system column, it is therefore not possible to simply record the status of a row that is created and expired in the same multi-statement transaction. For that reason, a special combo command id is created that references a local memory hash that contains the actual cmin and cmax values. It means that if combo id is being used the number we are seeing would not be the cmin or cmax it will be an index into a local array that contains a structure with has the actual cmin and cmax values. -------------------- 2. Cursor Visibility Problem (CVP) -------------------- Consider the test case create table tt1(f1 int); BEGIN; insert into tt1 values(1); declare c50 cursor for select * from tt1; -- should show one row only insert into tt1 values(2); fetch all from c50; COMMIT; The fetch shows both inserted rows, where as it is supposed to show only 1. -------------------- 3. Solution of CVP -------------------- Consider Data node 1 log (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read committed READ WRITE] (b) [exec_simple_query][1026][drop table tt1;] (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read committed READ WRITE] (f) [exec_simple_query][1026][create table tt1(f1 int);] (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read committed READ WRITE] (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM tt1] (m) [exec_simple_query][1026][COMMIT TRANSACTION] The cursor currently shows both inserted rows because command id at data node in step (j) is 0 step (k) is 1 & step (l) is 2 Where as we need command ids to be step (j) should be 0 step (k) should be 2 & step (l) should be 1 This will solve the cursor visibility problem. To implement this we send command IDs to data nodes from the coordinator like we send gxid. The only difference will be that we do not need to take command IDs from GTM since they are only valid with in the transaction. A new message 'M' is used for this purpose. However sending command ids from coordinator to data node alone is not sufficient because data node can increment command ids e.g. in case of triggers fired at the data node or rules etc. These changes must be brought into the notice of the coordinator for it to decide the command id of the next command correctly. Hence we send changes in command ids from data node to the coordinator as well. The coordinator chooses the largest command id for the next command to issue. A new message 'Y' is used for this purpose. -------------------- 4. INSERT-SELECT Problem -------------------- Consider the use of INSERT-SELECT in case rows are inserted into a child by selecting from its parent. In this case XC was transforming the single INSERT-SELECT command into a multi-statement transaction block on the data nodes. [STEP 1.] START TRANSACTION ISOLATION LEVEL read committed READ WRITE [STEP 2.] SELECT col1,col2 FROM ONLY parent_table WHERE true [STEP 3.] INSERT INTO public.child_table (col1, col2) VALUES ($1, $2) [STEP 4.] (-- bind and execute here --) [STEP 5.] SELECT col1, col2 FROM ONLY child_table WHERE true [STEP 6.] INSERT INTO public.child_table (col1, col2) VALUES ($1, $2) [STEP 7.] (-- bind and execute here --) [STEP 8.] COMMIT TRANSACTION The inserts of step 3 will be visible to select of step 5, hence we were getting duplicate rows in the child table. -------------------- 5. Solution of INSERT-SELECT Problem -------------------- Note down the command id of the insert-select and use that as the command id of select query in [STEP 5.]. The infrastructure developed to exchange command ids is used to do so. |
From: Koichi S. <koi...@gm...> - 2012-07-19 16:09:45
|
If a table is distributed, you can define unique constraint only to the distribution key. This limitation is the same as those for PostgreSQL's partitioning. At present, XC provides constraints which can be enforced locally in datanodes. Regards; ---------- Koichi Suzuki 2012/7/19 Noe Gutierrez <noe...@gm...>: > I have installed postgres XC on fedora 16 successfully, Could you help me, > I need to create UNIQUE Constrain, but I don't how to do it > > > Thanks!!!!! > I.S.C. Noe Adahi Gutiérrez Romero > Oracle® Certified Professional, Java® SE 6 Programmer > IBM® InfoShere® Guardium® Technical Professional v1 > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Noe G. <noe...@gm...> - 2012-07-19 14:38:33
|
I have installed postgres XC on fedora 16 successfully, Could you help me, I need to create UNIQUE Constrain, but I don't how to do it Thanks!!!!! *I.S.C. Noe Adahi Gutiérrez Romero* ***Oracle® Certified Professional, Java® SE 6 Programmer* *IBM® InfoShere® Guardium® Technical Professional v1* |
From: Abbas B. <abb...@en...> - 2012-07-19 06:01:09
|
Thanks Michael for the review, Thanks Ashutosh for the comments, Michael I will accommodate your valuable feedback and will send revised patch. Ashutosh I will send the summarized design in the email with the revised patch. BTW enb is short for enable! On Thu, Jul 19, 2012 at 10:37 AM, Michael Paquier <mic...@gm... > wrote: > Hi, > > Finally I took some time to look at this code, it is important support. > 1) The code is not realigned with current master (xc_misc.sql) and > contains whitespaces (2 in xc_misc.out, perhaps other files as well). So I > realigned it myself to test the code. > 2) In GetCurrentCommandId xact.c, you need to set !IsConnFromCoord() with > (IS_PGXC_COORDINATOR) to avoid remote coordinators incrementing the command > ID. Remote Coordinators do not receive any command IDs from remote nodes. > Also here using an if/else if structure won't hurt and bring clarity in the > operations. > 3) In BeginTransactionBlock of xact.c, you need to add !IsConnFromCoord() > as a secondary condition to avoid remote Coordinator problems. > 4) By looking at the patch It is possible to reduce the number of > variables used for the control of Command ID. In the current version of > your patch, you are using 5 static variables: > static CommandId cmdIdFromCoord; /* Datanode will use command id > received from coordinator */ > static bool cmdIdRcvdAtDatanode; /* Has the coordinator sent > command id to the data node? */ > static bool enbCmdIdChgReporting; /* Should datanode report > command id changes to coordinator? */ > static bool enbCoordCmdIdSending; /* Should coordinator send > command id to the data node? */ > static CommandId cmdIdFromDatanode; /* Coordinator will receive > updated command id from datanode */ > However, cmdIdRcvdAtDatanode, cmdIdFromCoord and enbCmdIdChgReporting are > used only at Datanode. > cmdIdFromDatanode and enbCoordCmdIdSending are only used at Coordinator > level. > You should combine enbCoordCmdIdSending and enbCmdIdChgReporting into a > single variable with a generic name, like let's say sendCommandId or > reportCommandId. Specifying in comments the role of this unique variable > for local Coordinator and remote nodes will help. Those 2 variables play > the same role which is to control if a > In the case of Coordinator, this variable is flagged to true when > transforming INSERT statements (transformInsertStmt) and when a transaction > block is begun (BeginTransactionBlock). > In the case of a Datanode, this variable is flagged to true only when a > command ID has been received from Coordinator. > > Then, you should combine cmdIdFromCoord and cmdIdFromDatanode into a > single variable, like receivedCommandId. In this case, being at Coordinator > or at Datanode means that this command ID has been received from a remode > node. If you have other ideas for names please feel free! > Also changing the name cmdIdRcvdAtDatanode to isCommandIdReceived might > help... > Just a question, what means the preffix enb? (just by curiosity) :) > > 5) In xact.c, at the end of CommitTransaction, PrepareTransaction, > CleanupTransaction, you will need to change the cleanup of the variables if > you take into account comment 4). > 6) In BeginTransactionBlock, PrepareTransactionBlock, EndTransactionBlock > and UserAbortTransactionBlock of xact.c, you need to set an additional > !IsConnFromCoord() at this place, this will avoid remote Coordinators > trying to send CommandIds to inexistent nodes. > if(IS_PGXC_COORDINATOR) > + SetCmdIdSending(false); > > 7) You should combine the messages 'Y' and 'M'. The content and the goals > are the same. You can also eliminate the 2nd byte of 'Y' without > consequences I think. > 8) Why didn't you put the reception of message 'M' in PostgresMain of > postgres.c at the same place as the reception of GXID, snapshot, timestamp, > etc. > Putting the reception of message at the end of SocketBackend makes the > code kind of inconsistent. If you keep it inside SocketBackend, you should > at least manage the message inside the switch, however I do not see any > particular reason why you receive this message in SocketBackend and not in > PostgresMain. So anything I should know? > You should also remove the flag IS_PGXC_DATANODE when receiving the > command ID. Remote Coordinator might need to be able to receive command IDs > as well, even if there is no direct usage now, but for stuff like triggers?? > 9) I found some diffs with the test select_views. It might be worth to > look at what is happening. It is perhaps an environment-related issue. Btw, > my regression diffs are attached. > 10) The modifications you are bringing to combocid_1.out are consistent > with postgres, cool! > > No performance impact with this patch! Tested and approved. This is good. > No regression impact with this patch. > > So, to conclude, I think it is a good piece of work, really cool stuff > that I believe will be helpful for triggers and even other things. > It just needs to be polished and simplified a bit. However basics are > here, and performance is not impacted. Really it was worth spending time on > that. The additional test cases also look to be sufficient, but I might be > missing smth so feel free to add new tests if necessary. > Thanks! > > -- > Michael Paquier > http://michael.otacoo.com > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Ashutosh B. <ash...@en...> - 2012-07-19 04:21:06
|
Hi Abbas, You have mentioned "design discussion earlier". Do we have that in some mail. Even better if you could put that down here in writing for ready reference. On Thu, Jul 19, 2012 at 4:26 AM, Abbas Butt <abb...@en...>wrote: > Hi, > Attached please find the patch that solves cursor visibility problem as > well as the inserting into child by selecting from the parent problem using > the design discussed earlier. > Test cases are added and regressions are fixed accordingly. > Patch is ready for review. > > Thanks. > > > On Tue, Jul 10, 2012 at 6:54 AM, Koichi Suzuki <koi...@gm...>wrote: > >> I basically agree on the idea. I also think Michael needs this kind >> of extension to feedback trigger target rows. We should design >> NoticeResponse extensible enough for various usage. >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/7/10 Abbas Butt <abb...@en...>: >> > Here is the update on the issue. >> > >> > It was decided that the changes done by the data nodes in the command id >> > should be communicated back to the coordinator and that the coordinator >> > should choose the largest of all the received values as the next >> command id. >> > >> > It was suggested that we should check that a skipped value of command id >> > should not create a problem for the next operations on the table. I have >> > verified both by studying the code and by actually changing the function >> > CommandCounterIncrement to increment the command id by 3 and running >> > regression. It worked fine so a hole in command id is not a problem. >> > >> > Next it was suggested that we should use the mechanism currently in >> place to >> > send # of tuples affected by a statement to communicate the changed >> command >> > id to the coordinator. >> > Please refer to this link in the documentation >> > http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html >> > Note that there is no such message format that exists in the current >> over >> > the wire protocol to communicate the # of tuples affected by a >> statement. >> > The libpq functions that we might suspect do the same are PQntuples and >> > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as >> > PQcmdTuples extracts the # of tuples affected from the CommandComplete >> 'C' >> > message string. We cannot use these mechanisms for our purpose. >> > >> > I evaluated the use of NoticeResponse 'N' for sending changed command >> id but >> > the message format of NoticeResponse mandates the use of certain fields >> > which will make our sent messages un-necessarily bulky and would consume >> > network bandwidth for no reason. >> > >> > I therefore suggest the we use a new message for communicating XC >> specific >> > information from data node to coordinator. Currently we will use it for >> > command id but we will design the message format flexible enough to >> > accommodate future XC requirements. Whenever the data node increments >> > command id we will send the information to the coordinator and >> > handle_response function in execRemote.c would be changed to >> accommodate new >> > message. Since coordinators will never use the new message therefore the >> > existing clients do not need to bother. >> > >> > Comments or suggestions are welcome. >> > >> > Regards >> > >> > >> > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en... >> > >> > wrote: >> >> >> >> While fixing the regression failures resulting from the changes done by >> >> the patch I was able to fix all except this test case >> >> >> >> set enforce_two_phase_commit = off; >> >> >> >> CREATE TEMP TABLE users ( >> >> id INT PRIMARY KEY, >> >> name VARCHAR NOT NULL >> >> ) DISTRIBUTE BY REPLICATION; >> >> >> >> INSERT INTO users VALUES (1, 'Jozko'); >> >> INSERT INTO users VALUES (2, 'Ferko'); >> >> INSERT INTO users VALUES (3, 'Samko'); >> >> CREATE TEMP TABLE tasks ( >> >> id INT PRIMARY KEY, >> >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL >> >> ) DISTRIBUTE BY REPLICATION; >> >> >> >> INSERT INTO tasks VALUES (1,1,NULL,NULL); >> >> INSERT INTO tasks VALUES (2,2,2,NULL); >> >> INSERT INTO tasks VALUES (3,3,3,3); >> >> >> >> BEGIN; >> >> >> >> UPDATE tasks set id=id WHERE id=2; >> >> SELECT * FROM tasks; >> >> >> >> DELETE FROM users WHERE id = 2; >> >> SELECT * FROM tasks; >> >> >> >> COMMIT; >> >> >> >> The obtained output from the last select statement is >> >> >> >> id | owner | worker | checked_by >> >> ----+-------+--------+------------ >> >> 1 | 1 | | >> >> 3 | 3 | 3 | 3 >> >> 2 | 2 | 2 | >> >> (3 rows) >> >> >> >> where as the expected output is >> >> >> >> id | owner | worker | checked_by >> >> ----+-------+--------+------------ >> >> 1 | 1 | | >> >> 3 | 3 | 3 | 3 >> >> 2 | | | >> >> (3 rows) >> >> >> >> Note that the owner and worker have been set to null due to "ON DELETE >> SET >> >> NULL". >> >> >> >> Here is the reason why this does not work properly. Consider the last >> >> transaction >> >> >> >> BEGIN; >> >> >> >> UPDATE tasks set id=id WHERE id=2; >> >> SELECT * FROM tasks; >> >> >> >> DELETE FROM users WHERE id = 2; >> >> SELECT * FROM tasks; >> >> >> >> COMMIT; >> >> >> >> Here are the command id values the coordinator sends to the data node >> >> >> >> 0 for the first update that gets incremented to 1 because this is a DML >> >> and needs to consume a command id >> >> 1 for the first select that remains 1 since it is not required to be >> >> consumed. >> >> 1 for the delete statement that gets incremented to 2 because it is a >> DML >> >> and 2 for the last select. >> >> >> >> Now this is what happens on the data node >> >> >> >> When the data node receives the first update with command id 0, it >> >> increments it once due to the update itself and once due to the update >> run >> >> because of "ON UPDATE CASCADE". Hence the command id at the end of >> update on >> >> data node is 2. >> >> The first select comes to data node with command id 1, which is >> incorrect. >> >> The user's intention is to see data after update and its command id >> should >> >> be 2. >> >> Now delete comes with command id 1, and data node increments it once >> due >> >> to the delete itself and once due to the update run because of "ON >> DELETE >> >> SET NULL", hence the command id at the end of delete is 3. >> >> Coordinator now sends last select with command id 2, which is again >> >> incorrect since user's intention is to see data after delete and select >> >> should have been sent to data node with command id 3 or 4. >> >> >> >> Every time data node increments command id due to any statements run >> >> implicitly either because of the constraints or triggers, this scheme >> of >> >> sending command ids to data node from coordinator to solve fetch >> problems >> >> would fail. >> >> >> >> Datanode can have a trigger e.g. inserting rows thrice on every single >> >> insert and would increment command id on every insert. Therefore this >> design >> >> cannot work. >> >> >> >> Either we have to synchronize command ids between datanode and >> coordinator >> >> through GTM >> >> OR >> >> We will have to send the DECLARE CURSOR down to the datanode. In this >> case >> >> however we will not be able to send the cursor query as it is because >> the >> >> query might contain a join on two tables which exist on a disjoint set >> of >> >> data nodes. >> >> >> >> Comments or suggestions are welcome. >> >> >> >> >> >> >> >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt < >> abb...@en...> >> >> wrote: >> >>> >> >>> Thanks for your comments. >> >>> >> >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat >> >>> <ash...@en...> wrote: >> >>>> >> >>>> Hi Abbas, >> >>>> I have few comments to make >> >>>> 1. With this patch there are two variables for having command Id, >> that >> >>>> is going to cause confusion and will be a maintenance burden, might >> be error >> >>>> prone. Is it possible to use a single variable instead of two? >> >>> >> >>> >> >>> Are you talking about receivedCommandId and currentCommandId? If yes, >> I >> >>> would prefer not having a packet received from coordinator overwrite >> the >> >>> currentCommandId at data node, because I am not 100% sure about the >> life >> >>> time of currentCommandId, I might overwrite it before time. It would >> be safe >> >>> to let currentCommandId as is unless we are compelled to get the next >> >>> command ID, and have the received command id take priority at that >> time. >> >>> >> >>>> >> >>>> Right now there is some code which is specific to cursors in your >> patch. >> >>>> If you can plug the coordinator command id somehow into >> currentCommandId, >> >>>> you won't need that code and any other code which needs coordinator >> command >> >>>> ID will be automatically taken care of. >> >>> >> >>> >> >>> That code is required to solve a problem. Consider this case when a >> >>> coordinator received this transaction >> >>> >> >>> >> >>> BEGIN; >> >>> insert into tt1 values(1); >> >>> declare c50 cursor for select * from tt1; >> >>> insert into tt1 values(2); >> >>> fetch all from c50; >> >>> COMMIT; >> >>> >> >>> While sending select to the data node in response to a fetch we need >> to >> >>> know what was the command ID of the declare cursor statement and we >> need to >> >>> send that command ID to the data node for this particular fetch. This >> is the >> >>> main idea behind this solution. >> >>> >> >>> The first insert goes to the data node with command id 0, the second >> >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When >> >>> coordinator sees fetch it needs to send select to the data node with >> command >> >>> ID 1 rather than 3. >> >>> >> >>> >> >>>> >> >>>> 2. A non-transaction on coordinator can spawn tranasactions on >> datanode >> >>>> or subtransactions (if there is already a transaction running). Does >> your >> >>>> patch handle that case? >> >>> >> >>> >> >>> No and it does not need to, because that case has no known problems >> that >> >>> we need to solve. I don't think my patch would impact any such case >> but I >> >>> will analyze any failures that I may get in regressions. >> >>> >> >>>> >> >>>> Should we do more thorough research in the transaction management, >> esp. >> >>>> to see the impact of getting same command id for two commands on the >> >>>> datanode? >> >>> >> >>> >> >>> If we issue two commands with the same command ID then we will >> definitely >> >>> have visibility issues according to the rules I have already >> explained. But >> >>> we will not have two commands sent to the data node with same command >> id. >> >>> >> >>>> >> >>>> >> >>>> >> >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt >> >>>> <abb...@en...> wrote: >> >>>>> >> >>>>> Hi Ashutosh, >> >>>>> Here are the results with the val column, Thanks. >> >>>>> >> >>>>> test=# drop table mvcc_demo; >> >>>>> DROP TABLE >> >>>>> test=# >> >>>>> test=# create table mvcc_demo (val int); >> >>>>> CREATE TABLE >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> test=# >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >> >>>>> combo id would be different >> >>>>> DELETE 0 >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 0 >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 0 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80689 | 0 | 3 | f >> >>>>> 80689 | 0 | 4 | f >> >>>>> 80689 | 0 | 5 | f >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80689 | 0 | 3 | 3 | 1 >> >>>>> 80689 | 0 | 4 | 4 | 2 >> >>>>> 80689 | 0 | 5 | 5 | 3 >> >>>>> >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 3 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80689 | 80689 | 0 | t >> >>>>> 80689 | 80689 | 1 | t >> >>>>> 80689 | 80689 | 2 | t >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> ------+------+------+------+----- >> >>>>> (0 rows) >> >>>>> >> >>>>> >> >>>>> test=# >> >>>>> test=# END; >> >>>>> COMMIT >> >>>>> test=# >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80693 | 0 | 0 | f >> >>>>> 80693 | 0 | 1 | f >> >>>>> 80693 | 0 | 2 | f >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80693 | 0 | 0 | 0 | 1 >> >>>>> 80693 | 0 | 1 | 1 | 2 >> >>>>> 80693 | 0 | 2 | 2 | 3 >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# UPDATE mvcc_demo SET val = 10; >> >>>>> >> >>>>> UPDATE 3 >> >>>>> test=# >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80693 | 80693 | 0 | t >> >>>>> 80693 | 80693 | 1 | t >> >>>>> 80693 | 80693 | 2 | t >> >>>>> 80693 | 0 | 3 | f >> >>>>> 80693 | 0 | 3 | f >> >>>>> 80693 | 0 | 3 | f >> >>>>> (6 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> (3 rows) >> >>>>> >> >>>>> >> >>>>> test=# >> >>>>> test=# END; >> >>>>> COMMIT >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- From one psql issue >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80699 | 0 | 0 | f >> >>>>> (1 row) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80699 | 0 | 0 | 0 | 1 >> >>>>> (1 row) >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# -- From another issue >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80699 | 0 | 0 | f >> >>>>> 80700 | 0 | 0 | f >> >>>>> 80700 | 0 | 1 | f >> >>>>> 80700 | 0 | 2 | f >> >>>>> (4 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80699 | 0 | 0 | 0 | 1 >> >>>>> 80700 | 0 | 0 | 0 | 2 >> >>>>> 80700 | 0 | 1 | 1 | 3 >> >>>>> 80700 | 0 | 2 | 2 | 4 >> >>>>> (4 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# UPDATE mvcc_demo SET val = 10; >> >>>>> >> >>>>> UPDATE 4 >> >>>>> test=# >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80700 | 80700 | 0 | t >> >>>>> 80700 | 80700 | 1 | t >> >>>>> 80700 | 80700 | 2 | t >> >>>>> 80699 | 80700 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> (8 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> (4 rows) >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# -- Before finishing this, issue these from the first psql >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80700 | 80700 | 0 | t >> >>>>> 80700 | 80700 | 1 | t >> >>>>> 80700 | 80700 | 2 | t >> >>>>> 80699 | 80700 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> (8 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+-------+------+------+----- >> >>>>> 80699 | 80700 | 3 | 3 | 1 >> >>>>> (1 row) >> >>>>> >> >>>>> test=# end; >> >>>>> COMMIT >> >>>>> >> >>>>> >> >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier >> >>>>> <mic...@gm...> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, >> so >> >>>>>> be sure to send it to remote Datanodes really only if necessary. >> >>>>>> You should put more severe conditions blocking this function cid >> can >> >>>>>> easily get incremented in Postgres. >> >>>>>> >> >>>>>> Regards, >> >>>>>> >> >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt >> >>>>>> <abb...@en...> wrote: >> >>>>>>> >> >>>>>>> PFA a WIP patch implementing the design presented earlier. >> >>>>>>> The patch is WIP because it still has and FIXME and it shows some >> >>>>>>> regression failures that need to be fixed, but other than that it >> confirms >> >>>>>>> that the suggested design would work fine. The following test >> cases now work >> >>>>>>> fine >> >>>>>>> >> >>>>>>> drop table tt1; >> >>>>>>> create table tt1(f1 int) distribute by replication; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> >> >>>>>>> insert into tt1 values(2); >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> insert into tt1 values(2); >> >>>>>>> >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(3); >> >>>>>>> >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> declare c51 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(3); >> >>>>>>> fetch all from c50; >> >>>>>>> fetch all from c51; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> declare c51 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> insert into tt1 values(3); >> >>>>>>> fetch all from c50; >> >>>>>>> fetch all from c51; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt >> >>>>>>> <abb...@en...> wrote: >> >>>>>>>> >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> In a multi-statement transaction each statement is given a >> command >> >>>>>>>> identifier >> >>>>>>>> starting from zero and incrementing for each statement. >> >>>>>>>> These command indentifers are required for extra tracking because >> >>>>>>>> each >> >>>>>>>> statement has its own visibility rules with in the transaction. >> >>>>>>>> For example, a cursor’s contents must remain unchanged even if >> later >> >>>>>>>> statements in the >> >>>>>>>> same transaction modify rows. Such tracking is implemented using >> >>>>>>>> system command id >> >>>>>>>> columns cmin/cmax, which is internally actually is a single >> column. >> >>>>>>>> >> >>>>>>>> cmin/cmax come into play in case of multi-statement transactions >> >>>>>>>> only, >> >>>>>>>> they are both zero otherwise. >> >>>>>>>> >> >>>>>>>> cmin "The command identifier of the statement within the >> inserting >> >>>>>>>> transaction." >> >>>>>>>> cmax "The command identifier of the statement within the deleting >> >>>>>>>> transaction." >> >>>>>>>> >> >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) >> >>>>>>>> >> >>>>>>>> ( // A heap tuple is valid >> >>>>>>>> "now" iff >> >>>>>>>> Xmin == my-transaction && // inserted by the >> current >> >>>>>>>> transaction >> >>>>>>>> Cmin < my-command && // before this command, >> and >> >>>>>>>> ( >> >>>>>>>> Xmax is null || // the row has not been >> >>>>>>>> deleted, or >> >>>>>>>> ( >> >>>>>>>> Xmax == my-transaction && // it was deleted by the >> >>>>>>>> current transaction >> >>>>>>>> Cmax >= my-command // but not before this >> >>>>>>>> command, >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> || // or >> >>>>>>>> ( >> >>>>>>>> Xmin is committed && // the row was inserted >> by >> >>>>>>>> a committed transaction, and >> >>>>>>>> ( >> >>>>>>>> Xmax is null || // the row has not been >> >>>>>>>> deleted, or >> >>>>>>>> ( >> >>>>>>>> Xmax == my-transaction && // the row is being >> deleted >> >>>>>>>> by this transaction >> >>>>>>>> Cmax >= my-command) || // but it's not deleted >> >>>>>>>> "yet", or >> >>>>>>>> ( >> >>>>>>>> Xmax != my-transaction && // the row was deleted >> by >> >>>>>>>> another transaction >> >>>>>>>> Xmax is not committed // that has not been >> >>>>>>>> committed >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> >> >>>>>>>> Because cmin and cmax are internally a single system column, >> >>>>>>>> it is therefore not possible to simply record the status of a row >> >>>>>>>> that is created and expired in the same multi-statement >> transaction. >> >>>>>>>> For that reason, a special combo command id is created that >> >>>>>>>> references >> >>>>>>>> a local memory hash that contains the actual cmin and cmax >> values. >> >>>>>>>> It means that if combo id is being used the number we are seeing >> >>>>>>>> would not be the cmin or cmax it will be an index into a local >> >>>>>>>> array that contains a structure with has the actual cmin and cmax >> >>>>>>>> values. >> >>>>>>>> >> >>>>>>>> The following queries (taken mostly from >> >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >> >>>>>>>> use the contrib module pageinspect, which allows >> >>>>>>>> visibility of internal heap page structures and all stored rows, >> >>>>>>>> including those not visible in the current snapshot. >> >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >> >>>>>>>> >> >>>>>>>> We are exploring 3 examples here: >> >>>>>>>> 1) INSERT & DELETE in a single transaction >> >>>>>>>> 2) INSERT & UPDATE in a single transaction >> >>>>>>>> 3) INSERT from two different transactions & UPDATE from one >> >>>>>>>> >> >>>>>>>> test=# drop table mvcc_demo; >> >>>>>>>> DROP TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# create table mvcc_demo (val int); >> >>>>>>>> CREATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show >> that >> >>>>>>>> combo id would be different >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80685 | 0 | 3 | f >> >>>>>>>> 80685 | 0 | 4 | f >> >>>>>>>> 80685 | 0 | 5 | f >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 3 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80685 | 80685 | 0 | t >> >>>>>>>> 80685 | 80685 | 1 | t >> >>>>>>>> 80685 | 80685 | 2 | t >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax >> >>>>>>>> they are actually >> >>>>>>>> the indexes of the internal array already explained above. >> >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 >> >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 >> >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80675 | 0 | 0 | f >> >>>>>>>> 80675 | 0 | 1 | f >> >>>>>>>> 80675 | 0 | 2 | f >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >> >>>>>>>> UPDATE 3 >> >>>>>>>> test=# >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80675 | 80675 | 0 | t >> >>>>>>>> 80675 | 80675 | 1 | t >> >>>>>>>> 80675 | 80675 | 2 | t >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> (6 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> >> >>>>>>>> -- From one psql issue >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool >> AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80677 | 0 | 0 | f >> >>>>>>>> (1 row) >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> test=# -- From another issue >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80677 | 0 | 0 | f >> >>>>>>>> 80678 | 0 | 0 | f >> >>>>>>>> 80678 | 0 | 1 | f >> >>>>>>>> 80678 | 0 | 2 | f >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >> >>>>>>>> UPDATE 4 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80678 | 80678 | 0 | t >> >>>>>>>> 80678 | 80678 | 1 | t >> >>>>>>>> 80678 | 80678 | 2 | t >> >>>>>>>> 80677 | 80678 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> (8 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> >> >>>>>>>> test=# -- Before finishing this, issue these from the first >> psql >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool >> AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80678 | 80678 | 0 | t >> >>>>>>>> 80678 | 80678 | 1 | t >> >>>>>>>> 80678 | 80678 | 2 | t >> >>>>>>>> 80677 | 80678 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> (8 rows) >> >>>>>>>> >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Now consider the case we are trying to solve >> >>>>>>>> >> >>>>>>>> drop table tt1; >> >>>>>>>> create table tt1(f1 int); >> >>>>>>>> >> >>>>>>>> BEGIN; >> >>>>>>>> insert into tt1 values(1); >> >>>>>>>> declare c50 cursor for select * from tt1; -- should show one row >> >>>>>>>> only >> >>>>>>>> insert into tt1 values(2); >> >>>>>>>> fetch all from c50; >> >>>>>>>> COMMIT; >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Consider Data node 1 log >> >>>>>>>> >> >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >> >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >> >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >> >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >> >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >> >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >> >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >> >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >> >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() >> >>>>>>>> FROM tt1] >> >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >> >>>>>>>> >> >>>>>>>> The cursor currently shows both inserted rows because command id >> at >> >>>>>>>> data node in >> >>>>>>>> step (j) is 0 >> >>>>>>>> step (k) is 1 & >> >>>>>>>> step (l) is 2 >> >>>>>>>> >> >>>>>>>> Where as we need command ids to be >> >>>>>>>> >> >>>>>>>> step (j) should be 0 >> >>>>>>>> step (k) should be 2 & >> >>>>>>>> step (l) should be 1 >> >>>>>>>> >> >>>>>>>> This will solve the cursor visibility problem. >> >>>>>>>> >> >>>>>>>> To implement this I suggest we send command IDs to data nodes >> from >> >>>>>>>> the coordinator >> >>>>>>>> like we send gxid. The only difference will be that we do not >> need >> >>>>>>>> to take command IDs >> >>>>>>>> from GTM since they are only valid with in the transaction. >> >>>>>>>> >> >>>>>>>> See this example >> >>>>>>>> >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> ------+------+------+------+---- >> >>>>>>>> (0 rows) >> >>>>>>>> >> >>>>>>>> test=# begin; >> >>>>>>>> BEGIN >> >>>>>>>> test=# insert into tt1 values(1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> (1 row) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> (2 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(4); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# end; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(5); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >> >>>>>>>> (5 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(6); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >> >>>>>>>> 80617 | 0 | 0 | 0 | 6 >> >>>>>>>> (6 rows) >> >>>>>>>> >> >>>>>>>> Note that at the end of the multi-statement transaction the >> command >> >>>>>>>> id gets reset to zero. >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> Abbas >> >>>>>>>> Architect >> >>>>>>>> EnterpriseDB Corporation >> >>>>>>>> The Enterprise PostgreSQL Company >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> -- >> >>>>>>> Abbas >> >>>>>>> Architect >> >>>>>>> EnterpriseDB Corporation >> >>>>>>> The Enterprise PostgreSQL Company >> >>>>>>> >> >>>>>>> Phone: 92-334-5100153 >> >>>>>>> >> >>>>>>> Website: www.enterprisedb.com >> >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>>>>>> >> >>>>>>> This e-mail message (and any attachment) is intended for the use >> of >> >>>>>>> the individual or entity to whom it is addressed. This message >> >>>>>>> contains information from EnterpriseDB Corporation that may be >> >>>>>>> privileged, confidential, or exempt from disclosure under >> applicable >> >>>>>>> law. If you are not the intended recipient or authorized to >> receive >> >>>>>>> this for the intended recipient, any use, dissemination, >> >>>>>>> distribution, >> >>>>>>> retention, archiving, or copying of this communication is strictly >> >>>>>>> prohibited. If you have received this e-mail in error, please >> notify >> >>>>>>> the sender immediately by reply e-mail and delete this message. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> ------------------------------------------------------------------------------ >> >>>>>>> Live Security Virtual Conference >> >>>>>>> Exclusive live event will cover all the ways today's security and >> >>>>>>> threat landscape has changed and how IT managers can respond. >> >>>>>>> Discussions >> >>>>>>> will include endpoint security, mobile security and the latest in >> >>>>>>> malware >> >>>>>>> threats. >> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >>>>>>> _______________________________________________ >> >>>>>>> Postgres-xc-developers mailing list >> >>>>>>> Pos...@li... >> >>>>>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> Michael Paquier >> >>>>>> http://michael.otacoo.com >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> -- >> >>>>> Abbas >> >>>>> Architect >> >>>>> EnterpriseDB Corporation >> >>>>> The Enterprise PostgreSQL Company >> >>>>> >> >>>>> Phone: 92-334-5100153 >> >>>>> >> >>>>> Website: www.enterprisedb.com >> >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>>>> >> >>>>> This e-mail message (and any attachment) is intended for the use of >> >>>>> the individual or entity to whom it is addressed. This message >> >>>>> contains information from EnterpriseDB Corporation that may be >> >>>>> privileged, confidential, or exempt from disclosure under applicable >> >>>>> law. If you are not the intended recipient or authorized to receive >> >>>>> this for the intended recipient, any use, dissemination, >> distribution, >> >>>>> retention, archiving, or copying of this communication is strictly >> >>>>> prohibited. If you have received this e-mail in error, please notify >> >>>>> the sender immediately by reply e-mail and delete this message. >> >>>>> >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>> Live Security Virtual Conference >> >>>>> Exclusive live event will cover all the ways today's security and >> >>>>> threat landscape has changed and how IT managers can respond. >> >>>>> Discussions >> >>>>> will include endpoint security, mobile security and the latest in >> >>>>> malware >> >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >>>>> _______________________________________________ >> >>>>> Postgres-xc-developers mailing list >> >>>>> Pos...@li... >> >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Best Wishes, >> >>>> Ashutosh Bapat >> >>>> EntepriseDB Corporation >> >>>> The Enterprise Postgres Company >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> -- >> >>> Abbas >> >>> Architect >> >>> EnterpriseDB Corporation >> >>> The Enterprise PostgreSQL Company >> >>> >> >>> Phone: 92-334-5100153 >> >>> >> >>> Website: www.enterprisedb.com >> >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>> >> >>> This e-mail message (and any attachment) is intended for the use of >> >>> the individual or entity to whom it is addressed. This message >> >>> contains information from EnterpriseDB Corporation that may be >> >>> privileged, confidential, or exempt from disclosure under applicable >> >>> law. If you are not the intended recipient or authorized to receive >> >>> this for the intended recipient, any use, dissemination, distribution, >> >>> retention, archiving, or copying of this communication is strictly >> >>> prohibited. If you have received this e-mail in error, please notify >> >>> the sender immediately by reply e-mail and delete this message. >> >> >> >> >> >> >> >> >> >> -- >> >> -- >> >> Abbas >> >> Architect >> >> EnterpriseDB Corporation >> >> The Enterprise PostgreSQL Company >> >> >> >> Phone: 92-334-5100153 >> >> >> >> Website: www.enterprisedb.com >> >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> >> >> This e-mail message (and any attachment) is intended for the use of >> >> the individual or entity to whom it is addressed. This message >> >> contains information from EnterpriseDB Corporation that may be >> >> privileged, confidential, or exempt from disclosure under applicable >> >> law. If you are not the intended recipient or authorized to receive >> >> this for the intended recipient, any use, dissemination, distribution, >> >> retention, archiving, or copying of this communication is strictly >> >> prohibited. If you have received this e-mail in error, please notify >> >> the sender immediately by reply e-mail and delete this message. >> > >> > >> > >> > >> > -- >> > -- >> > Abbas >> > Architect >> > EnterpriseDB Corporation >> > The Enterprise PostgreSQL Company >> > >> > Phone: 92-334-5100153 >> > >> > Website: www.enterprisedb.com >> > EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> > Follow us on Twitter: http://www.twitter.com/enterprisedb >> > >> > This e-mail message (and any attachment) is intended for the use of >> > the individual or entity to whom it is addressed. This message >> > contains information from EnterpriseDB Corporation that may be >> > privileged, confidential, or exempt from disclosure under applicable >> > law. If you are not the intended recipient or authorized to receive >> > this for the intended recipient, any use, dissemination, distribution, >> > retention, archiving, or copying of this communication is strictly >> > prohibited. If you have received this e-mail in error, please notify >> > the sender immediately by reply e-mail and delete this message. >> > >> > >> ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> Discussions >> > will include endpoint security, mobile security and the latest in >> malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Abbas B. <abb...@en...> - 2012-07-19 04:07:04
|
Thanks Michael. On Thu, Jul 19, 2012 at 5:01 AM, Michael Paquier <mic...@gm...>wrote: > Great! I will try to grab some time to look into that, perhaps not today > as I have several meetings. > > > On Thu, Jul 19, 2012 at 7:56 AM, Abbas Butt <abb...@en...>wrote: > >> Hi, >> Attached please find the patch that solves cursor visibility problem as >> well as the inserting into child by selecting from the parent problem using >> the design discussed earlier. >> Test cases are added and regressions are fixed accordingly. >> Patch is ready for review. >> >> Thanks. >> >> >> On Tue, Jul 10, 2012 at 6:54 AM, Koichi Suzuki <koi...@gm... >> > wrote: >> >>> I basically agree on the idea. I also think Michael needs this kind >>> of extension to feedback trigger target rows. We should design >>> NoticeResponse extensible enough for various usage. >>> >>> Regards; >>> ---------- >>> Koichi Suzuki >>> >>> >>> 2012/7/10 Abbas Butt <abb...@en...>: >>> > Here is the update on the issue. >>> > >>> > It was decided that the changes done by the data nodes in the command >>> id >>> > should be communicated back to the coordinator and that the coordinator >>> > should choose the largest of all the received values as the next >>> command id. >>> > >>> > It was suggested that we should check that a skipped value of command >>> id >>> > should not create a problem for the next operations on the table. I >>> have >>> > verified both by studying the code and by actually changing the >>> function >>> > CommandCounterIncrement to increment the command id by 3 and running >>> > regression. It worked fine so a hole in command id is not a problem. >>> > >>> > Next it was suggested that we should use the mechanism currently in >>> place to >>> > send # of tuples affected by a statement to communicate the changed >>> command >>> > id to the coordinator. >>> > Please refer to this link in the documentation >>> > >>> http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html >>> > Note that there is no such message format that exists in the current >>> over >>> > the wire protocol to communicate the # of tuples affected by a >>> statement. >>> > The libpq functions that we might suspect do the same are PQntuples and >>> > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where >>> as >>> > PQcmdTuples extracts the # of tuples affected from the CommandComplete >>> 'C' >>> > message string. We cannot use these mechanisms for our purpose. >>> > >>> > I evaluated the use of NoticeResponse 'N' for sending changed command >>> id but >>> > the message format of NoticeResponse mandates the use of certain fields >>> > which will make our sent messages un-necessarily bulky and would >>> consume >>> > network bandwidth for no reason. >>> > >>> > I therefore suggest the we use a new message for communicating XC >>> specific >>> > information from data node to coordinator. Currently we will use it for >>> > command id but we will design the message format flexible enough to >>> > accommodate future XC requirements. Whenever the data node increments >>> > command id we will send the information to the coordinator and >>> > handle_response function in execRemote.c would be changed to >>> accommodate new >>> > message. Since coordinators will never use the new message therefore >>> the >>> > existing clients do not need to bother. >>> > >>> > Comments or suggestions are welcome. >>> > >>> > Regards >>> > >>> > >>> > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt < >>> abb...@en...> >>> > wrote: >>> >> >>> >> While fixing the regression failures resulting from the changes done >>> by >>> >> the patch I was able to fix all except this test case >>> >> >>> >> set enforce_two_phase_commit = off; >>> >> >>> >> CREATE TEMP TABLE users ( >>> >> id INT PRIMARY KEY, >>> >> name VARCHAR NOT NULL >>> >> ) DISTRIBUTE BY REPLICATION; >>> >> >>> >> INSERT INTO users VALUES (1, 'Jozko'); >>> >> INSERT INTO users VALUES (2, 'Ferko'); >>> >> INSERT INTO users VALUES (3, 'Samko'); >>> >> CREATE TEMP TABLE tasks ( >>> >> id INT PRIMARY KEY, >>> >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >>> >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >>> >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL >>> >> ) DISTRIBUTE BY REPLICATION; >>> >> >>> >> INSERT INTO tasks VALUES (1,1,NULL,NULL); >>> >> INSERT INTO tasks VALUES (2,2,2,NULL); >>> >> INSERT INTO tasks VALUES (3,3,3,3); >>> >> >>> >> BEGIN; >>> >> >>> >> UPDATE tasks set id=id WHERE id=2; >>> >> SELECT * FROM tasks; >>> >> >>> >> DELETE FROM users WHERE id = 2; >>> >> SELECT * FROM tasks; >>> >> >>> >> COMMIT; >>> >> >>> >> The obtained output from the last select statement is >>> >> >>> >> id | owner | worker | checked_by >>> >> ----+-------+--------+------------ >>> >> 1 | 1 | | >>> >> 3 | 3 | 3 | 3 >>> >> 2 | 2 | 2 | >>> >> (3 rows) >>> >> >>> >> where as the expected output is >>> >> >>> >> id | owner | worker | checked_by >>> >> ----+-------+--------+------------ >>> >> 1 | 1 | | >>> >> 3 | 3 | 3 | 3 >>> >> 2 | | | >>> >> (3 rows) >>> >> >>> >> Note that the owner and worker have been set to null due to "ON >>> DELETE SET >>> >> NULL". >>> >> >>> >> Here is the reason why this does not work properly. Consider the last >>> >> transaction >>> >> >>> >> BEGIN; >>> >> >>> >> UPDATE tasks set id=id WHERE id=2; >>> >> SELECT * FROM tasks; >>> >> >>> >> DELETE FROM users WHERE id = 2; >>> >> SELECT * FROM tasks; >>> >> >>> >> COMMIT; >>> >> >>> >> Here are the command id values the coordinator sends to the data node >>> >> >>> >> 0 for the first update that gets incremented to 1 because this is a >>> DML >>> >> and needs to consume a command id >>> >> 1 for the first select that remains 1 since it is not required to be >>> >> consumed. >>> >> 1 for the delete statement that gets incremented to 2 because it is a >>> DML >>> >> and 2 for the last select. >>> >> >>> >> Now this is what happens on the data node >>> >> >>> >> When the data node receives the first update with command id 0, it >>> >> increments it once due to the update itself and once due to the >>> update run >>> >> because of "ON UPDATE CASCADE". Hence the command id at the end of >>> update on >>> >> data node is 2. >>> >> The first select comes to data node with command id 1, which is >>> incorrect. >>> >> The user's intention is to see data after update and its command id >>> should >>> >> be 2. >>> >> Now delete comes with command id 1, and data node increments it once >>> due >>> >> to the delete itself and once due to the update run because of "ON >>> DELETE >>> >> SET NULL", hence the command id at the end of delete is 3. >>> >> Coordinator now sends last select with command id 2, which is again >>> >> incorrect since user's intention is to see data after delete and >>> select >>> >> should have been sent to data node with command id 3 or 4. >>> >> >>> >> Every time data node increments command id due to any statements run >>> >> implicitly either because of the constraints or triggers, this scheme >>> of >>> >> sending command ids to data node from coordinator to solve fetch >>> problems >>> >> would fail. >>> >> >>> >> Datanode can have a trigger e.g. inserting rows thrice on every single >>> >> insert and would increment command id on every insert. Therefore this >>> design >>> >> cannot work. >>> >> >>> >> Either we have to synchronize command ids between datanode and >>> coordinator >>> >> through GTM >>> >> OR >>> >> We will have to send the DECLARE CURSOR down to the datanode. In this >>> case >>> >> however we will not be able to send the cursor query as it is because >>> the >>> >> query might contain a join on two tables which exist on a disjoint >>> set of >>> >> data nodes. >>> >> >>> >> Comments or suggestions are welcome. >>> >> >>> >> >>> >> >>> >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt < >>> abb...@en...> >>> >> wrote: >>> >>> >>> >>> Thanks for your comments. >>> >>> >>> >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat >>> >>> <ash...@en...> wrote: >>> >>>> >>> >>>> Hi Abbas, >>> >>>> I have few comments to make >>> >>>> 1. With this patch there are two variables for having command Id, >>> that >>> >>>> is going to cause confusion and will be a maintenance burden, might >>> be error >>> >>>> prone. Is it possible to use a single variable instead of two? >>> >>> >>> >>> >>> >>> Are you talking about receivedCommandId and currentCommandId? If >>> yes, I >>> >>> would prefer not having a packet received from coordinator overwrite >>> the >>> >>> currentCommandId at data node, because I am not 100% sure about the >>> life >>> >>> time of currentCommandId, I might overwrite it before time. It would >>> be safe >>> >>> to let currentCommandId as is unless we are compelled to get the next >>> >>> command ID, and have the received command id take priority at that >>> time. >>> >>> >>> >>>> >>> >>>> Right now there is some code which is specific to cursors in your >>> patch. >>> >>>> If you can plug the coordinator command id somehow into >>> currentCommandId, >>> >>>> you won't need that code and any other code which needs coordinator >>> command >>> >>>> ID will be automatically taken care of. >>> >>> >>> >>> >>> >>> That code is required to solve a problem. Consider this case when a >>> >>> coordinator received this transaction >>> >>> >>> >>> >>> >>> BEGIN; >>> >>> insert into tt1 values(1); >>> >>> declare c50 cursor for select * from tt1; >>> >>> insert into tt1 values(2); >>> >>> fetch all from c50; >>> >>> COMMIT; >>> >>> >>> >>> While sending select to the data node in response to a fetch we need >>> to >>> >>> know what was the command ID of the declare cursor statement and we >>> need to >>> >>> send that command ID to the data node for this particular fetch. >>> This is the >>> >>> main idea behind this solution. >>> >>> >>> >>> The first insert goes to the data node with command id 0, the second >>> >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When >>> >>> coordinator sees fetch it needs to send select to the data node with >>> command >>> >>> ID 1 rather than 3. >>> >>> >>> >>> >>> >>>> >>> >>>> 2. A non-transaction on coordinator can spawn tranasactions on >>> datanode >>> >>>> or subtransactions (if there is already a transaction running). >>> Does your >>> >>>> patch handle that case? >>> >>> >>> >>> >>> >>> No and it does not need to, because that case has no known problems >>> that >>> >>> we need to solve. I don't think my patch would impact any such case >>> but I >>> >>> will analyze any failures that I may get in regressions. >>> >>> >>> >>>> >>> >>>> Should we do more thorough research in the transaction management, >>> esp. >>> >>>> to see the impact of getting same command id for two commands on the >>> >>>> datanode? >>> >>> >>> >>> >>> >>> If we issue two commands with the same command ID then we will >>> definitely >>> >>> have visibility issues according to the rules I have already >>> explained. But >>> >>> we will not have two commands sent to the data node with same >>> command id. >>> >>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt >>> >>>> <abb...@en...> wrote: >>> >>>>> >>> >>>>> Hi Ashutosh, >>> >>>>> Here are the results with the val column, Thanks. >>> >>>>> >>> >>>>> test=# drop table mvcc_demo; >>> >>>>> DROP TABLE >>> >>>>> test=# >>> >>>>> test=# create table mvcc_demo (val int); >>> >>>>> CREATE TABLE >>> >>>>> test=# >>> >>>>> test=# TRUNCATE mvcc_demo; >>> >>>>> TRUNCATE TABLE >>> >>>>> test=# >>> >>>>> test=# BEGIN; >>> >>>>> BEGIN >>> >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>> >>>>> combo id would be different >>> >>>>> DELETE 0 >>> >>>>> test=# DELETE FROM mvcc_demo; >>> >>>>> DELETE 0 >>> >>>>> test=# DELETE FROM mvcc_demo; >>> >>>>> DELETE 0 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>> INSERT 0 1 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+------+-----------+------------- >>> >>>>> 80689 | 0 | 3 | f >>> >>>>> 80689 | 0 | 4 | f >>> >>>>> 80689 | 0 | 5 | f >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80689 | 0 | 3 | 3 | 1 >>> >>>>> 80689 | 0 | 4 | 4 | 2 >>> >>>>> 80689 | 0 | 5 | 5 | 3 >>> >>>>> >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# DELETE FROM mvcc_demo; >>> >>>>> DELETE 3 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+-------+-----------+------------- >>> >>>>> 80689 | 80689 | 0 | t >>> >>>>> 80689 | 80689 | 1 | t >>> >>>>> 80689 | 80689 | 2 | t >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> ------+------+------+------+----- >>> >>>>> (0 rows) >>> >>>>> >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# END; >>> >>>>> COMMIT >>> >>>>> test=# >>> >>>>> test=# >>> >>>>> test=# TRUNCATE mvcc_demo; >>> >>>>> TRUNCATE TABLE >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> test=# BEGIN; >>> >>>>> BEGIN >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>> INSERT 0 1 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+------+-----------+------------- >>> >>>>> 80693 | 0 | 0 | f >>> >>>>> 80693 | 0 | 1 | f >>> >>>>> 80693 | 0 | 2 | f >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80693 | 0 | 0 | 0 | 1 >>> >>>>> 80693 | 0 | 1 | 1 | 2 >>> >>>>> 80693 | 0 | 2 | 2 | 3 >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# UPDATE mvcc_demo SET val = 10; >>> >>>>> >>> >>>>> UPDATE 3 >>> >>>>> test=# >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+-------+-----------+------------- >>> >>>>> 80693 | 80693 | 0 | t >>> >>>>> 80693 | 80693 | 1 | t >>> >>>>> 80693 | 80693 | 2 | t >>> >>>>> 80693 | 0 | 3 | f >>> >>>>> 80693 | 0 | 3 | f >>> >>>>> 80693 | 0 | 3 | f >>> >>>>> (6 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80693 | 0 | 3 | 3 | 10 >>> >>>>> 80693 | 0 | 3 | 3 | 10 >>> >>>>> 80693 | 0 | 3 | 3 | 10 >>> >>>>> (3 rows) >>> >>>>> >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# END; >>> >>>>> COMMIT >>> >>>>> test=# >>> >>>>> test=# TRUNCATE mvcc_demo; >>> >>>>> TRUNCATE TABLE >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- From one psql issue >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>> INSERT 0 1 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+------+-----------+------------- >>> >>>>> 80699 | 0 | 0 | f >>> >>>>> (1 row) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by >>> val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80699 | 0 | 0 | 0 | 1 >>> >>>>> (1 row) >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> test=# -- From another issue >>> >>>>> test=# BEGIN; >>> >>>>> BEGIN >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>> INSERT 0 1 >>> >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>> >>>>> INSERT 0 1 >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+------+-----------+------------- >>> >>>>> 80699 | 0 | 0 | f >>> >>>>> 80700 | 0 | 0 | f >>> >>>>> 80700 | 0 | 1 | f >>> >>>>> 80700 | 0 | 2 | f >>> >>>>> (4 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80699 | 0 | 0 | 0 | 1 >>> >>>>> 80700 | 0 | 0 | 0 | 2 >>> >>>>> 80700 | 0 | 1 | 1 | 3 >>> >>>>> 80700 | 0 | 2 | 2 | 4 >>> >>>>> (4 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# UPDATE mvcc_demo SET val = 10; >>> >>>>> >>> >>>>> UPDATE 4 >>> >>>>> test=# >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+-------+-----------+------------- >>> >>>>> 80700 | 80700 | 0 | t >>> >>>>> 80700 | 80700 | 1 | t >>> >>>>> 80700 | 80700 | 2 | t >>> >>>>> 80699 | 80700 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> (8 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+------+------+------+----- >>> >>>>> 80700 | 0 | 3 | 3 | 10 >>> >>>>> 80700 | 0 | 3 | 3 | 10 >>> >>>>> 80700 | 0 | 3 | 3 | 10 >>> >>>>> 80700 | 0 | 3 | 3 | 10 >>> >>>>> (4 rows) >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> test=# -- Before finishing this, issue these from the first >>> psql >>> >>>>> test=# SELECT t_xmin AS xmin, >>> >>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>> is_combocid >>> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>> -------+-------+-----------+------------- >>> >>>>> 80700 | 80700 | 0 | t >>> >>>>> 80700 | 80700 | 1 | t >>> >>>>> 80700 | 80700 | 2 | t >>> >>>>> 80699 | 80700 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> 80700 | 0 | 3 | f >>> >>>>> (8 rows) >>> >>>>> >>> >>>>> test=# >>> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by >>> val; >>> >>>>> xmin | xmax | cmin | cmax | val >>> >>>>> -------+-------+------+------+----- >>> >>>>> 80699 | 80700 | 3 | 3 | 1 >>> >>>>> (1 row) >>> >>>>> >>> >>>>> test=# end; >>> >>>>> COMMIT >>> >>>>> >>> >>>>> >>> >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier >>> >>>>> <mic...@gm...> wrote: >>> >>>>>> >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> I expect pgxc_node_send_cmd_id to have some impact on >>> performance, so >>> >>>>>> be sure to send it to remote Datanodes really only if necessary. >>> >>>>>> You should put more severe conditions blocking this function cid >>> can >>> >>>>>> easily get incremented in Postgres. >>> >>>>>> >>> >>>>>> Regards, >>> >>>>>> >>> >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt >>> >>>>>> <abb...@en...> wrote: >>> >>>>>>> >>> >>>>>>> PFA a WIP patch implementing the design presented earlier. >>> >>>>>>> The patch is WIP because it still has and FIXME and it shows some >>> >>>>>>> regression failures that need to be fixed, but other than that >>> it confirms >>> >>>>>>> that the suggested design would work fine. The following test >>> cases now work >>> >>>>>>> fine >>> >>>>>>> >>> >>>>>>> drop table tt1; >>> >>>>>>> create table tt1(f1 int) distribute by replication; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> fetch all from c50; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> fetch all from c50; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(3); >>> >>>>>>> >>> >>>>>>> fetch all from c50; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> declare c51 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(3); >>> >>>>>>> fetch all from c50; >>> >>>>>>> fetch all from c51; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> BEGIN; >>> >>>>>>> insert into tt1 values(1); >>> >>>>>>> declare c50 cursor for select * from tt1; >>> >>>>>>> declare c51 cursor for select * from tt1; >>> >>>>>>> insert into tt1 values(2); >>> >>>>>>> insert into tt1 values(3); >>> >>>>>>> fetch all from c50; >>> >>>>>>> fetch all from c51; >>> >>>>>>> COMMIT; >>> >>>>>>> truncate table tt1; >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt >>> >>>>>>> <abb...@en...> wrote: >>> >>>>>>>> >>> >>>>>>>> Hi, >>> >>>>>>>> >>> >>>>>>>> In a multi-statement transaction each statement is given a >>> command >>> >>>>>>>> identifier >>> >>>>>>>> starting from zero and incrementing for each statement. >>> >>>>>>>> These command indentifers are required for extra tracking >>> because >>> >>>>>>>> each >>> >>>>>>>> statement has its own visibility rules with in the transaction. >>> >>>>>>>> For example, a cursor’s contents must remain unchanged even if >>> later >>> >>>>>>>> statements in the >>> >>>>>>>> same transaction modify rows. Such tracking is implemented >>> using >>> >>>>>>>> system command id >>> >>>>>>>> columns cmin/cmax, which is internally actually is a single >>> column. >>> >>>>>>>> >>> >>>>>>>> cmin/cmax come into play in case of multi-statement transactions >>> >>>>>>>> only, >>> >>>>>>>> they are both zero otherwise. >>> >>>>>>>> >>> >>>>>>>> cmin "The command identifier of the statement within the >>> inserting >>> >>>>>>>> transaction." >>> >>>>>>>> cmax "The command identifier of the statement within the >>> deleting >>> >>>>>>>> transaction." >>> >>>>>>>> >>> >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) >>> >>>>>>>> >>> >>>>>>>> ( // A heap tuple is >>> valid >>> >>>>>>>> "now" iff >>> >>>>>>>> Xmin == my-transaction && // inserted by the >>> current >>> >>>>>>>> transaction >>> >>>>>>>> Cmin < my-command && // before this >>> command, and >>> >>>>>>>> ( >>> >>>>>>>> Xmax is null || // the row has not been >>> >>>>>>>> deleted, or >>> >>>>>>>> ( >>> >>>>>>>> Xmax == my-transaction && // it was deleted by >>> the >>> >>>>>>>> current transaction >>> >>>>>>>> Cmax >= my-command // but not before this >>> >>>>>>>> command, >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> || // or >>> >>>>>>>> ( >>> >>>>>>>> Xmin is committed && // the row was >>> inserted by >>> >>>>>>>> a committed transaction, and >>> >>>>>>>> ( >>> >>>>>>>> Xmax is null || // the row has not been >>> >>>>>>>> deleted, or >>> >>>>>>>> ( >>> >>>>>>>> Xmax == my-transaction && // the row is being >>> deleted >>> >>>>>>>> by this transaction >>> >>>>>>>> Cmax >= my-command) || // but it's not deleted >>> >>>>>>>> "yet", or >>> >>>>>>>> ( >>> >>>>>>>> Xmax != my-transaction && // the row was deleted >>> by >>> >>>>>>>> another transaction >>> >>>>>>>> Xmax is not committed // that has not been >>> >>>>>>>> committed >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> ) >>> >>>>>>>> >>> >>>>>>>> Because cmin and cmax are internally a single system column, >>> >>>>>>>> it is therefore not possible to simply record the status of a >>> row >>> >>>>>>>> that is created and expired in the same multi-statement >>> transaction. >>> >>>>>>>> For that reason, a special combo command id is created that >>> >>>>>>>> references >>> >>>>>>>> a local memory hash that contains the actual cmin and cmax >>> values. >>> >>>>>>>> It means that if combo id is being used the number we are seeing >>> >>>>>>>> would not be the cmin or cmax it will be an index into a local >>> >>>>>>>> array that contains a structure with has the actual cmin and >>> cmax >>> >>>>>>>> values. >>> >>>>>>>> >>> >>>>>>>> The following queries (taken mostly from >>> >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>> >>>>>>>> use the contrib module pageinspect, which allows >>> >>>>>>>> visibility of internal heap page structures and all stored rows, >>> >>>>>>>> including those not visible in the current snapshot. >>> >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>> >>>>>>>> >>> >>>>>>>> We are exploring 3 examples here: >>> >>>>>>>> 1) INSERT & DELETE in a single transaction >>> >>>>>>>> 2) INSERT & UPDATE in a single transaction >>> >>>>>>>> 3) INSERT from two different transactions & UPDATE from one >>> >>>>>>>> >>> >>>>>>>> test=# drop table mvcc_demo; >>> >>>>>>>> DROP TABLE >>> >>>>>>>> test=# >>> >>>>>>>> test=# create table mvcc_demo (val int); >>> >>>>>>>> CREATE TABLE >>> >>>>>>>> test=# >>> >>>>>>>> test=# TRUNCATE mvcc_demo; >>> >>>>>>>> TRUNCATE TABLE >>> >>>>>>>> test=# >>> >>>>>>>> test=# BEGIN; >>> >>>>>>>> BEGIN >>> >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show >>> that >>> >>>>>>>> combo id would be different >>> >>>>>>>> DELETE 0 >>> >>>>>>>> test=# DELETE FROM mvcc_demo; >>> >>>>>>>> DELETE 0 >>> >>>>>>>> test=# DELETE FROM mvcc_demo; >>> >>>>>>>> DELETE 0 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+------+-----------+------------- >>> >>>>>>>> 80685 | 0 | 3 | f >>> >>>>>>>> 80685 | 0 | 4 | f >>> >>>>>>>> 80685 | 0 | 5 | f >>> >>>>>>>> (3 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# DELETE FROM mvcc_demo; >>> >>>>>>>> DELETE 3 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+-------+-----------+------------- >>> >>>>>>>> 80685 | 80685 | 0 | t >>> >>>>>>>> 80685 | 80685 | 1 | t >>> >>>>>>>> 80685 | 80685 | 2 | t >>> >>>>>>>> (3 rows) >>> >>>>>>>> >>> >>>>>>>> Note that since is_combocid is true the numbers are not >>> cmin/cmax >>> >>>>>>>> they are actually >>> >>>>>>>> the indexes of the internal array already explained above. >>> >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 >>> >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 >>> >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# END; >>> >>>>>>>> COMMIT >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# TRUNCATE mvcc_demo; >>> >>>>>>>> TRUNCATE TABLE >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# BEGIN; >>> >>>>>>>> BEGIN >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+------+-----------+------------- >>> >>>>>>>> 80675 | 0 | 0 | f >>> >>>>>>>> 80675 | 0 | 1 | f >>> >>>>>>>> 80675 | 0 | 2 | f >>> >>>>>>>> (3 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>> >>>>>>>> UPDATE 3 >>> >>>>>>>> test=# >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+-------+-----------+------------- >>> >>>>>>>> 80675 | 80675 | 0 | t >>> >>>>>>>> 80675 | 80675 | 1 | t >>> >>>>>>>> 80675 | 80675 | 2 | t >>> >>>>>>>> 80675 | 0 | 3 | f >>> >>>>>>>> 80675 | 0 | 3 | f >>> >>>>>>>> 80675 | 0 | 3 | f >>> >>>>>>>> (6 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# END; >>> >>>>>>>> COMMIT >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# TRUNCATE mvcc_demo; >>> >>>>>>>> TRUNCATE TABLE >>> >>>>>>>> test=# >>> >>>>>>>> >>> >>>>>>>> -- From one psql issue >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & >>> X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+------+-----------+------------- >>> >>>>>>>> 80677 | 0 | 0 | f >>> >>>>>>>> (1 row) >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> test=# -- From another issue >>> >>>>>>>> test=# BEGIN; >>> >>>>>>>> BEGIN >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+------+-----------+------------- >>> >>>>>>>> 80677 | 0 | 0 | f >>> >>>>>>>> 80678 | 0 | 0 | f >>> >>>>>>>> 80678 | 0 | 1 | f >>> >>>>>>>> 80678 | 0 | 2 | f >>> >>>>>>>> (4 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>> >>>>>>>> UPDATE 4 >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+-------+-----------+------------- >>> >>>>>>>> 80678 | 80678 | 0 | t >>> >>>>>>>> 80678 | 80678 | 1 | t >>> >>>>>>>> 80678 | 80678 | 2 | t >>> >>>>>>>> 80677 | 80678 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> (8 rows) >>> >>>>>>>> >>> >>>>>>>> test=# >>> >>>>>>>> >>> >>>>>>>> test=# -- Before finishing this, issue these from the first >>> psql >>> >>>>>>>> test=# SELECT t_xmin AS xmin, >>> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>> >>>>>>>> test-# (t_infomask::integer & >>> X'0020'::integer)::bool AS >>> >>>>>>>> is_combocid >>> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> >>>>>>>> test-# ORDER BY 2 DESC, 3; >>> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>> >>>>>>>> -------+-------+-----------+------------- >>> >>>>>>>> 80678 | 80678 | 0 | t >>> >>>>>>>> 80678 | 80678 | 1 | t >>> >>>>>>>> 80678 | 80678 | 2 | t >>> >>>>>>>> 80677 | 80678 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> 80678 | 0 | 3 | f >>> >>>>>>>> (8 rows) >>> >>>>>>>> >>> >>>>>>>> test=# END; >>> >>>>>>>> COMMIT >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Now consider the case we are trying to solve >>> >>>>>>>> >>> >>>>>>>> drop table tt1; >>> >>>>>>>> create table tt1(f1 int); >>> >>>>>>>> >>> >>>>>>>> BEGIN; >>> >>>>>>>> insert into tt1 values(1); >>> >>>>>>>> declare c50 cursor for select * from tt1; -- should show one >>> row >>> >>>>>>>> only >>> >>>>>>>> insert into tt1 values(2); >>> >>>>>>>> fetch all from c50; >>> >>>>>>>> COMMIT; >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Consider Data node 1 log >>> >>>>>>>> >>> >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >>> read >>> >>>>>>>> committed READ WRITE] >>> >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >>> >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>> >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>> >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >>> read >>> >>>>>>>> committed READ WRITE] >>> >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>> >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>> >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>> >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >>> read >>> >>>>>>>> committed READ WRITE] >>> >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>> >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>> >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, >>> pgxc_node_str() >>> >>>>>>>> FROM tt1] >>> >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>> >>>>>>>> >>> >>>>>>>> The cursor currently shows both inserted rows because command >>> id at >>> >>>>>>>> data node in >>> >>>>>>>> step (j) is 0 >>> >>>>>>>> step (k) is 1 & >>> >>>>>>>> step (l) is 2 >>> >>>>>>>> >>> >>>>>>>> Where as we need command ids to be >>> >>>>>>>> >>> >>>>>>>> step (j) should be 0 >>> >>>>>>>> step (k) should be 2 & >>> >>>>>>>> step (l) should be 1 >>> >>>>>>>> >>> >>>>>>>> This will solve the cursor visibility problem. >>> >>>>>>>> >>> >>>>>>>> To implement this I suggest we send command IDs to data nodes >>> from >>> >>>>>>>> the coordinator >>> >>>>>>>> like we send gxid. The only difference will be that we do not >>> need >>> >>>>>>>> to take command IDs >>> >>>>>>>> from GTM since they are only valid with in the transaction. >>> >>>>>>>> >>> >>>>>>>> See this example >>> >>>>>>>> >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> ------+------+------+------+---- >>> >>>>>>>> (0 rows) >>> >>>>>>>> >>> >>>>>>>> test=# begin; >>> >>>>>>>> BEGIN >>> >>>>>>>> test=# insert into tt1 values(1); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> (1 row) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(2); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> (2 rows) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(3); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> (3 rows) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(4); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>> >>>>>>>> (4 rows) >>> >>>>>>>> >>> >>>>>>>> test=# end; >>> >>>>>>>> COMMIT >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>> >>>>>>>> (4 rows) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(5); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >>> >>>>>>>> (5 rows) >>> >>>>>>>> >>> >>>>>>>> test=# insert into tt1 values(6); >>> >>>>>>>> INSERT 0 1 >>> >>>>>>>> test=# >>> >>>>>>>> test=# >>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> >>>>>>>> xmin | xmax | cmin | cmax | f1 >>> >>>>>>>> -------+------+------+------+---- >>> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >>> >>>>>>>> 80617 | 0 | 0 | 0 | 6 >>> >>>>>>>> (6 rows) >>> >>>>>>>> >>> >>>>>>>> Note that at the end of the multi-statement transaction the >>> command >>> >>>>>>>> id gets reset to zero. >>> >>>>>>>> >>> >>>>>>>> -- >>> >>>>>>>> Abbas >>> >>>>>>>> Architect >>> >>>>>>>> EnterpriseDB Corporation >>> >>>>>>>> The Enterprise PostgreSQL Company >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -- >>> >>>>>>> -- >>> >>>>>>> Abbas >>> >>>>>>> Architect >>> >>>>>>> EnterpriseDB Corporation >>> >>>>>>> The Enterprise PostgreSQL Company >>> >>>>>>> >>> >>>>>>> Phone: 92-334-5100153 >>> >>>>>>> >>> >>>>>>> Website: www.enterprisedb.com >>> >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>>>>>> >>> >>>>>>> This e-mail message (and any attachment) is intended for the use >>> of >>> >>>>>>> the individual or entity to whom it is addressed. This message >>> >>>>>>> contains information from EnterpriseDB Corporation that may be >>> >>>>>>> privileged, confidential, or exempt from disclosure under >>> applicable >>> >>>>>>> law. If you are not the intended recipient or authorized to >>> receive >>> >>>>>>> this for the intended recipient, any use, dissemination, >>> >>>>>>> distribution, >>> >>>>>>> retention, archiving, or copying of this communication is >>> strictly >>> >>>>>>> prohibited. If you have received this e-mail in error, please >>> notify >>> >>>>>>> the sender immediately by reply e-mail and delete this message. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> ------------------------------------------------------------------------------ >>> >>>>>>> Live Security Virtual Conference >>> >>>>>>> Exclusive live event will cover all the ways today's security and >>> >>>>>>> threat landscape has changed and how IT managers can respond. >>> >>>>>>> Discussions >>> >>>>>>> will include endpoint security, mobile security and the latest in >>> >>>>>>> malware >>> >>>>>>> threats. >>> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> >>>>>>> _______________________________________________ >>> >>>>>>> Postgres-xc-developers mailing list >>> >>>>>>> Pos...@li... >>> >>>>>>> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> -- >>> >>>>>> Michael Paquier >>> >>>>>> http://michael.otacoo.com >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> -- >>> >>>>> Abbas >>> >>>>> Architect >>> >>>>> EnterpriseDB Corporation >>> >>>>> The Enterprise PostgreSQL Company >>> >>>>> >>> >>>>> Phone: 92-334-5100153 >>> >>>>> >>> >>>>> Website: www.enterprisedb.com >>> >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>>>> >>> >>>>> This e-mail message (and any attachment) is intended for the use of >>> >>>>> the individual or entity to whom it is addressed. This message >>> >>>>> contains information from EnterpriseDB Corporation that may be >>> >>>>> privileged, confidential, or exempt from disclosure under >>> applicable >>> >>>>> law. If you are not the intended recipient or authorized to receive >>> >>>>> this for the intended recipient, any use, dissemination, >>> distribution, >>> >>>>> retention, archiving, or copying of this communication is strictly >>> >>>>> prohibited. If you have received this e-mail in error, please >>> notify >>> >>>>> the sender immediately by reply e-mail and delete this message. >>> >>>>> >>> >>>>> >>> >>>>> >>> ------------------------------------------------------------------------------ >>> >>>>> Live Security Virtual Conference >>> >>>>> Exclusive live event will cover all the ways today's security and >>> >>>>> threat landscape has changed and how IT managers can respond. >>> >>>>> Discussions >>> >>>>> will include endpoint security, mobile security and the latest in >>> >>>>> malware >>> >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> >>>>> _______________________________________________ >>> >>>>> Postgres-xc-developers mailing list >>> >>>>> Pos...@li... >>> >>>>> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Best Wishes, >>> >>>> Ashutosh Bapat >>> >>>> EntepriseDB Corporation >>> >>>> The Enterprise Postgres Company >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> -- >>> >>> Abbas >>> >>> Architect >>> >>> EnterpriseDB Corporation >>> >>> The Enterprise PostgreSQL Company >>> >>> >>> >>> Phone: 92-334-5100153 >>> >>> >>> >>> Website: www.enterprisedb.com >>> >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> >>> >>> This e-mail message (and any attachment) is intended for the use of >>> >>> the individual or entity to whom it is addressed. This message >>> >>> contains information from EnterpriseDB Corporation that may be >>> >>> privileged, confidential, or exempt from disclosure under applicable >>> >>> law. If you are not the intended recipient or authorized to receive >>> >>> this for the intended recipient, any use, dissemination, >>> distribution, >>> >>> retention, archiving, or copying of this communication is strictly >>> >>> prohibited. If you have received this e-mail in error, please notify >>> >>> the sender immediately by reply e-mail and delete this message. >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> -- >>> >> Abbas >>> >> Architect >>> >> EnterpriseDB Corporation >>> >> The Enterprise PostgreSQL Company >>> >> >>> >> Phone: 92-334-5100153 >>> >> >>> >> Website: www.enterprisedb.com >>> >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> >> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >> >>> >> This e-mail message (and any attachment) is intended for the use of >>> >> the individual or entity to whom it is addressed. This message >>> >> contains information from EnterpriseDB Corporation that may be >>> >> privileged, confidential, or exempt from disclosure under applicable >>> >> law. If you are not the intended recipient or authorized to receive >>> >> this for the intended recipient, any use, dissemination, distribution, >>> >> retention, archiving, or copying of this communication is strictly >>> >> prohibited. If you have received this e-mail in error, please notify >>> >> the sender immediately by reply e-mail and delete this message. >>> > >>> > >>> > >>> > >>> > -- >>> > -- >>> > Abbas >>> > Architect >>> > EnterpriseDB Corporation >>> > The Enterprise PostgreSQL Company >>> > >>> > Phone: 92-334-5100153 >>> > >>> > Website: www.enterprisedb.com >>> > EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> > Follow us on Twitter: http://www.twitter.com/enterprisedb >>> > >>> > This e-mail message (and any attachment) is intended for the use of >>> > the individual or entity to whom it is addressed. This message >>> > contains information from EnterpriseDB Corporation that may be >>> > privileged, confidential, or exempt from disclosure under applicable >>> > law. If you are not the intended recipient or authorized to receive >>> > this for the intended recipient, any use, dissemination, distribution, >>> > retention, archiving, or copying of this communication is strictly >>> > prohibited. If you have received this e-mail in error, please notify >>> > the sender immediately by reply e-mail and delete this message. >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Live Security Virtual Conference >>> > Exclusive live event will cover all the ways today's security and >>> > threat landscape has changed and how IT managers can respond. >>> Discussions >>> > will include endpoint security, mobile security and the latest in >>> malware >>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> > _______________________________________________ >>> > Postgres-xc-developers mailing list >>> > Pos...@li... >>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> > >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Michael Paquier > http://michael.otacoo.com > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2012-07-19 00:01:33
|
Great! I will try to grab some time to look into that, perhaps not today as I have several meetings. On Thu, Jul 19, 2012 at 7:56 AM, Abbas Butt <abb...@en...>wrote: > Hi, > Attached please find the patch that solves cursor visibility problem as > well as the inserting into child by selecting from the parent problem using > the design discussed earlier. > Test cases are added and regressions are fixed accordingly. > Patch is ready for review. > > Thanks. > > > On Tue, Jul 10, 2012 at 6:54 AM, Koichi Suzuki <koi...@gm...>wrote: > >> I basically agree on the idea. I also think Michael needs this kind >> of extension to feedback trigger target rows. We should design >> NoticeResponse extensible enough for various usage. >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/7/10 Abbas Butt <abb...@en...>: >> > Here is the update on the issue. >> > >> > It was decided that the changes done by the data nodes in the command id >> > should be communicated back to the coordinator and that the coordinator >> > should choose the largest of all the received values as the next >> command id. >> > >> > It was suggested that we should check that a skipped value of command id >> > should not create a problem for the next operations on the table. I have >> > verified both by studying the code and by actually changing the function >> > CommandCounterIncrement to increment the command id by 3 and running >> > regression. It worked fine so a hole in command id is not a problem. >> > >> > Next it was suggested that we should use the mechanism currently in >> place to >> > send # of tuples affected by a statement to communicate the changed >> command >> > id to the coordinator. >> > Please refer to this link in the documentation >> > http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html >> > Note that there is no such message format that exists in the current >> over >> > the wire protocol to communicate the # of tuples affected by a >> statement. >> > The libpq functions that we might suspect do the same are PQntuples and >> > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as >> > PQcmdTuples extracts the # of tuples affected from the CommandComplete >> 'C' >> > message string. We cannot use these mechanisms for our purpose. >> > >> > I evaluated the use of NoticeResponse 'N' for sending changed command >> id but >> > the message format of NoticeResponse mandates the use of certain fields >> > which will make our sent messages un-necessarily bulky and would consume >> > network bandwidth for no reason. >> > >> > I therefore suggest the we use a new message for communicating XC >> specific >> > information from data node to coordinator. Currently we will use it for >> > command id but we will design the message format flexible enough to >> > accommodate future XC requirements. Whenever the data node increments >> > command id we will send the information to the coordinator and >> > handle_response function in execRemote.c would be changed to >> accommodate new >> > message. Since coordinators will never use the new message therefore the >> > existing clients do not need to bother. >> > >> > Comments or suggestions are welcome. >> > >> > Regards >> > >> > >> > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en... >> > >> > wrote: >> >> >> >> While fixing the regression failures resulting from the changes done by >> >> the patch I was able to fix all except this test case >> >> >> >> set enforce_two_phase_commit = off; >> >> >> >> CREATE TEMP TABLE users ( >> >> id INT PRIMARY KEY, >> >> name VARCHAR NOT NULL >> >> ) DISTRIBUTE BY REPLICATION; >> >> >> >> INSERT INTO users VALUES (1, 'Jozko'); >> >> INSERT INTO users VALUES (2, 'Ferko'); >> >> INSERT INTO users VALUES (3, 'Samko'); >> >> CREATE TEMP TABLE tasks ( >> >> id INT PRIMARY KEY, >> >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL >> >> ) DISTRIBUTE BY REPLICATION; >> >> >> >> INSERT INTO tasks VALUES (1,1,NULL,NULL); >> >> INSERT INTO tasks VALUES (2,2,2,NULL); >> >> INSERT INTO tasks VALUES (3,3,3,3); >> >> >> >> BEGIN; >> >> >> >> UPDATE tasks set id=id WHERE id=2; >> >> SELECT * FROM tasks; >> >> >> >> DELETE FROM users WHERE id = 2; >> >> SELECT * FROM tasks; >> >> >> >> COMMIT; >> >> >> >> The obtained output from the last select statement is >> >> >> >> id | owner | worker | checked_by >> >> ----+-------+--------+------------ >> >> 1 | 1 | | >> >> 3 | 3 | 3 | 3 >> >> 2 | 2 | 2 | >> >> (3 rows) >> >> >> >> where as the expected output is >> >> >> >> id | owner | worker | checked_by >> >> ----+-------+--------+------------ >> >> 1 | 1 | | >> >> 3 | 3 | 3 | 3 >> >> 2 | | | >> >> (3 rows) >> >> >> >> Note that the owner and worker have been set to null due to "ON DELETE >> SET >> >> NULL". >> >> >> >> Here is the reason why this does not work properly. Consider the last >> >> transaction >> >> >> >> BEGIN; >> >> >> >> UPDATE tasks set id=id WHERE id=2; >> >> SELECT * FROM tasks; >> >> >> >> DELETE FROM users WHERE id = 2; >> >> SELECT * FROM tasks; >> >> >> >> COMMIT; >> >> >> >> Here are the command id values the coordinator sends to the data node >> >> >> >> 0 for the first update that gets incremented to 1 because this is a DML >> >> and needs to consume a command id >> >> 1 for the first select that remains 1 since it is not required to be >> >> consumed. >> >> 1 for the delete statement that gets incremented to 2 because it is a >> DML >> >> and 2 for the last select. >> >> >> >> Now this is what happens on the data node >> >> >> >> When the data node receives the first update with command id 0, it >> >> increments it once due to the update itself and once due to the update >> run >> >> because of "ON UPDATE CASCADE". Hence the command id at the end of >> update on >> >> data node is 2. >> >> The first select comes to data node with command id 1, which is >> incorrect. >> >> The user's intention is to see data after update and its command id >> should >> >> be 2. >> >> Now delete comes with command id 1, and data node increments it once >> due >> >> to the delete itself and once due to the update run because of "ON >> DELETE >> >> SET NULL", hence the command id at the end of delete is 3. >> >> Coordinator now sends last select with command id 2, which is again >> >> incorrect since user's intention is to see data after delete and select >> >> should have been sent to data node with command id 3 or 4. >> >> >> >> Every time data node increments command id due to any statements run >> >> implicitly either because of the constraints or triggers, this scheme >> of >> >> sending command ids to data node from coordinator to solve fetch >> problems >> >> would fail. >> >> >> >> Datanode can have a trigger e.g. inserting rows thrice on every single >> >> insert and would increment command id on every insert. Therefore this >> design >> >> cannot work. >> >> >> >> Either we have to synchronize command ids between datanode and >> coordinator >> >> through GTM >> >> OR >> >> We will have to send the DECLARE CURSOR down to the datanode. In this >> case >> >> however we will not be able to send the cursor query as it is because >> the >> >> query might contain a join on two tables which exist on a disjoint set >> of >> >> data nodes. >> >> >> >> Comments or suggestions are welcome. >> >> >> >> >> >> >> >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt < >> abb...@en...> >> >> wrote: >> >>> >> >>> Thanks for your comments. >> >>> >> >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat >> >>> <ash...@en...> wrote: >> >>>> >> >>>> Hi Abbas, >> >>>> I have few comments to make >> >>>> 1. With this patch there are two variables for having command Id, >> that >> >>>> is going to cause confusion and will be a maintenance burden, might >> be error >> >>>> prone. Is it possible to use a single variable instead of two? >> >>> >> >>> >> >>> Are you talking about receivedCommandId and currentCommandId? If yes, >> I >> >>> would prefer not having a packet received from coordinator overwrite >> the >> >>> currentCommandId at data node, because I am not 100% sure about the >> life >> >>> time of currentCommandId, I might overwrite it before time. It would >> be safe >> >>> to let currentCommandId as is unless we are compelled to get the next >> >>> command ID, and have the received command id take priority at that >> time. >> >>> >> >>>> >> >>>> Right now there is some code which is specific to cursors in your >> patch. >> >>>> If you can plug the coordinator command id somehow into >> currentCommandId, >> >>>> you won't need that code and any other code which needs coordinator >> command >> >>>> ID will be automatically taken care of. >> >>> >> >>> >> >>> That code is required to solve a problem. Consider this case when a >> >>> coordinator received this transaction >> >>> >> >>> >> >>> BEGIN; >> >>> insert into tt1 values(1); >> >>> declare c50 cursor for select * from tt1; >> >>> insert into tt1 values(2); >> >>> fetch all from c50; >> >>> COMMIT; >> >>> >> >>> While sending select to the data node in response to a fetch we need >> to >> >>> know what was the command ID of the declare cursor statement and we >> need to >> >>> send that command ID to the data node for this particular fetch. This >> is the >> >>> main idea behind this solution. >> >>> >> >>> The first insert goes to the data node with command id 0, the second >> >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When >> >>> coordinator sees fetch it needs to send select to the data node with >> command >> >>> ID 1 rather than 3. >> >>> >> >>> >> >>>> >> >>>> 2. A non-transaction on coordinator can spawn tranasactions on >> datanode >> >>>> or subtransactions (if there is already a transaction running). Does >> your >> >>>> patch handle that case? >> >>> >> >>> >> >>> No and it does not need to, because that case has no known problems >> that >> >>> we need to solve. I don't think my patch would impact any such case >> but I >> >>> will analyze any failures that I may get in regressions. >> >>> >> >>>> >> >>>> Should we do more thorough research in the transaction management, >> esp. >> >>>> to see the impact of getting same command id for two commands on the >> >>>> datanode? >> >>> >> >>> >> >>> If we issue two commands with the same command ID then we will >> definitely >> >>> have visibility issues according to the rules I have already >> explained. But >> >>> we will not have two commands sent to the data node with same command >> id. >> >>> >> >>>> >> >>>> >> >>>> >> >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt >> >>>> <abb...@en...> wrote: >> >>>>> >> >>>>> Hi Ashutosh, >> >>>>> Here are the results with the val column, Thanks. >> >>>>> >> >>>>> test=# drop table mvcc_demo; >> >>>>> DROP TABLE >> >>>>> test=# >> >>>>> test=# create table mvcc_demo (val int); >> >>>>> CREATE TABLE >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> test=# >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >> >>>>> combo id would be different >> >>>>> DELETE 0 >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 0 >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 0 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80689 | 0 | 3 | f >> >>>>> 80689 | 0 | 4 | f >> >>>>> 80689 | 0 | 5 | f >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80689 | 0 | 3 | 3 | 1 >> >>>>> 80689 | 0 | 4 | 4 | 2 >> >>>>> 80689 | 0 | 5 | 5 | 3 >> >>>>> >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# DELETE FROM mvcc_demo; >> >>>>> DELETE 3 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80689 | 80689 | 0 | t >> >>>>> 80689 | 80689 | 1 | t >> >>>>> 80689 | 80689 | 2 | t >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> ------+------+------+------+----- >> >>>>> (0 rows) >> >>>>> >> >>>>> >> >>>>> test=# >> >>>>> test=# END; >> >>>>> COMMIT >> >>>>> test=# >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80693 | 0 | 0 | f >> >>>>> 80693 | 0 | 1 | f >> >>>>> 80693 | 0 | 2 | f >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80693 | 0 | 0 | 0 | 1 >> >>>>> 80693 | 0 | 1 | 1 | 2 >> >>>>> 80693 | 0 | 2 | 2 | 3 >> >>>>> (3 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# UPDATE mvcc_demo SET val = 10; >> >>>>> >> >>>>> UPDATE 3 >> >>>>> test=# >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80693 | 80693 | 0 | t >> >>>>> 80693 | 80693 | 1 | t >> >>>>> 80693 | 80693 | 2 | t >> >>>>> 80693 | 0 | 3 | f >> >>>>> 80693 | 0 | 3 | f >> >>>>> 80693 | 0 | 3 | f >> >>>>> (6 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> 80693 | 0 | 3 | 3 | 10 >> >>>>> (3 rows) >> >>>>> >> >>>>> >> >>>>> test=# >> >>>>> test=# END; >> >>>>> COMMIT >> >>>>> test=# >> >>>>> test=# TRUNCATE mvcc_demo; >> >>>>> TRUNCATE TABLE >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- From one psql issue >> >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80699 | 0 | 0 | f >> >>>>> (1 row) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80699 | 0 | 0 | 0 | 1 >> >>>>> (1 row) >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# -- From another issue >> >>>>> test=# BEGIN; >> >>>>> BEGIN >> >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>> INSERT 0 1 >> >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >> >>>>> INSERT 0 1 >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+------+-----------+------------- >> >>>>> 80699 | 0 | 0 | f >> >>>>> 80700 | 0 | 0 | f >> >>>>> 80700 | 0 | 1 | f >> >>>>> 80700 | 0 | 2 | f >> >>>>> (4 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80699 | 0 | 0 | 0 | 1 >> >>>>> 80700 | 0 | 0 | 0 | 2 >> >>>>> 80700 | 0 | 1 | 1 | 3 >> >>>>> 80700 | 0 | 2 | 2 | 4 >> >>>>> (4 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# UPDATE mvcc_demo SET val = 10; >> >>>>> >> >>>>> UPDATE 4 >> >>>>> test=# >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80700 | 80700 | 0 | t >> >>>>> 80700 | 80700 | 1 | t >> >>>>> 80700 | 80700 | 2 | t >> >>>>> 80699 | 80700 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> (8 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+------+------+------+----- >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> 80700 | 0 | 3 | 3 | 10 >> >>>>> (4 rows) >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> test=# -- Before finishing this, issue these from the first psql >> >>>>> test=# SELECT t_xmin AS xmin, >> >>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>> is_combocid >> >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>> test-# ORDER BY 2 DESC, 3; >> >>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>> -------+-------+-----------+------------- >> >>>>> 80700 | 80700 | 0 | t >> >>>>> 80700 | 80700 | 1 | t >> >>>>> 80700 | 80700 | 2 | t >> >>>>> 80699 | 80700 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> 80700 | 0 | 3 | f >> >>>>> (8 rows) >> >>>>> >> >>>>> test=# >> >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> >>>>> xmin | xmax | cmin | cmax | val >> >>>>> -------+-------+------+------+----- >> >>>>> 80699 | 80700 | 3 | 3 | 1 >> >>>>> (1 row) >> >>>>> >> >>>>> test=# end; >> >>>>> COMMIT >> >>>>> >> >>>>> >> >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier >> >>>>> <mic...@gm...> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, >> so >> >>>>>> be sure to send it to remote Datanodes really only if necessary. >> >>>>>> You should put more severe conditions blocking this function cid >> can >> >>>>>> easily get incremented in Postgres. >> >>>>>> >> >>>>>> Regards, >> >>>>>> >> >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt >> >>>>>> <abb...@en...> wrote: >> >>>>>>> >> >>>>>>> PFA a WIP patch implementing the design presented earlier. >> >>>>>>> The patch is WIP because it still has and FIXME and it shows some >> >>>>>>> regression failures that need to be fixed, but other than that it >> confirms >> >>>>>>> that the suggested design would work fine. The following test >> cases now work >> >>>>>>> fine >> >>>>>>> >> >>>>>>> drop table tt1; >> >>>>>>> create table tt1(f1 int) distribute by replication; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> >> >>>>>>> insert into tt1 values(2); >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> insert into tt1 values(2); >> >>>>>>> >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(3); >> >>>>>>> >> >>>>>>> fetch all from c50; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> declare c51 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(3); >> >>>>>>> fetch all from c50; >> >>>>>>> fetch all from c51; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> BEGIN; >> >>>>>>> insert into tt1 values(1); >> >>>>>>> declare c50 cursor for select * from tt1; >> >>>>>>> declare c51 cursor for select * from tt1; >> >>>>>>> insert into tt1 values(2); >> >>>>>>> insert into tt1 values(3); >> >>>>>>> fetch all from c50; >> >>>>>>> fetch all from c51; >> >>>>>>> COMMIT; >> >>>>>>> truncate table tt1; >> >>>>>>> >> >>>>>>> >> >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt >> >>>>>>> <abb...@en...> wrote: >> >>>>>>>> >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> In a multi-statement transaction each statement is given a >> command >> >>>>>>>> identifier >> >>>>>>>> starting from zero and incrementing for each statement. >> >>>>>>>> These command indentifers are required for extra tracking because >> >>>>>>>> each >> >>>>>>>> statement has its own visibility rules with in the transaction. >> >>>>>>>> For example, a cursor’s contents must remain unchanged even if >> later >> >>>>>>>> statements in the >> >>>>>>>> same transaction modify rows. Such tracking is implemented using >> >>>>>>>> system command id >> >>>>>>>> columns cmin/cmax, which is internally actually is a single >> column. >> >>>>>>>> >> >>>>>>>> cmin/cmax come into play in case of multi-statement transactions >> >>>>>>>> only, >> >>>>>>>> they are both zero otherwise. >> >>>>>>>> >> >>>>>>>> cmin "The command identifier of the statement within the >> inserting >> >>>>>>>> transaction." >> >>>>>>>> cmax "The command identifier of the statement within the deleting >> >>>>>>>> transaction." >> >>>>>>>> >> >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) >> >>>>>>>> >> >>>>>>>> ( // A heap tuple is valid >> >>>>>>>> "now" iff >> >>>>>>>> Xmin == my-transaction && // inserted by the >> current >> >>>>>>>> transaction >> >>>>>>>> Cmin < my-command && // before this command, >> and >> >>>>>>>> ( >> >>>>>>>> Xmax is null || // the row has not been >> >>>>>>>> deleted, or >> >>>>>>>> ( >> >>>>>>>> Xmax == my-transaction && // it was deleted by the >> >>>>>>>> current transaction >> >>>>>>>> Cmax >= my-command // but not before this >> >>>>>>>> command, >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> || // or >> >>>>>>>> ( >> >>>>>>>> Xmin is committed && // the row was inserted >> by >> >>>>>>>> a committed transaction, and >> >>>>>>>> ( >> >>>>>>>> Xmax is null || // the row has not been >> >>>>>>>> deleted, or >> >>>>>>>> ( >> >>>>>>>> Xmax == my-transaction && // the row is being >> deleted >> >>>>>>>> by this transaction >> >>>>>>>> Cmax >= my-command) || // but it's not deleted >> >>>>>>>> "yet", or >> >>>>>>>> ( >> >>>>>>>> Xmax != my-transaction && // the row was deleted >> by >> >>>>>>>> another transaction >> >>>>>>>> Xmax is not committed // that has not been >> >>>>>>>> committed >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> ) >> >>>>>>>> >> >>>>>>>> Because cmin and cmax are internally a single system column, >> >>>>>>>> it is therefore not possible to simply record the status of a row >> >>>>>>>> that is created and expired in the same multi-statement >> transaction. >> >>>>>>>> For that reason, a special combo command id is created that >> >>>>>>>> references >> >>>>>>>> a local memory hash that contains the actual cmin and cmax >> values. >> >>>>>>>> It means that if combo id is being used the number we are seeing >> >>>>>>>> would not be the cmin or cmax it will be an index into a local >> >>>>>>>> array that contains a structure with has the actual cmin and cmax >> >>>>>>>> values. >> >>>>>>>> >> >>>>>>>> The following queries (taken mostly from >> >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >> >>>>>>>> use the contrib module pageinspect, which allows >> >>>>>>>> visibility of internal heap page structures and all stored rows, >> >>>>>>>> including those not visible in the current snapshot. >> >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >> >>>>>>>> >> >>>>>>>> We are exploring 3 examples here: >> >>>>>>>> 1) INSERT & DELETE in a single transaction >> >>>>>>>> 2) INSERT & UPDATE in a single transaction >> >>>>>>>> 3) INSERT from two different transactions & UPDATE from one >> >>>>>>>> >> >>>>>>>> test=# drop table mvcc_demo; >> >>>>>>>> DROP TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# create table mvcc_demo (val int); >> >>>>>>>> CREATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show >> that >> >>>>>>>> combo id would be different >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 0 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80685 | 0 | 3 | f >> >>>>>>>> 80685 | 0 | 4 | f >> >>>>>>>> 80685 | 0 | 5 | f >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# DELETE FROM mvcc_demo; >> >>>>>>>> DELETE 3 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80685 | 80685 | 0 | t >> >>>>>>>> 80685 | 80685 | 1 | t >> >>>>>>>> 80685 | 80685 | 2 | t >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax >> >>>>>>>> they are actually >> >>>>>>>> the indexes of the internal array already explained above. >> >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 >> >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 >> >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80675 | 0 | 0 | f >> >>>>>>>> 80675 | 0 | 1 | f >> >>>>>>>> 80675 | 0 | 2 | f >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >> >>>>>>>> UPDATE 3 >> >>>>>>>> test=# >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80675 | 80675 | 0 | t >> >>>>>>>> 80675 | 80675 | 1 | t >> >>>>>>>> 80675 | 80675 | 2 | t >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> 80675 | 0 | 3 | f >> >>>>>>>> (6 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# TRUNCATE mvcc_demo; >> >>>>>>>> TRUNCATE TABLE >> >>>>>>>> test=# >> >>>>>>>> >> >>>>>>>> -- From one psql issue >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool >> AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80677 | 0 | 0 | f >> >>>>>>>> (1 row) >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> test=# -- From another issue >> >>>>>>>> test=# BEGIN; >> >>>>>>>> BEGIN >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+------+-----------+------------- >> >>>>>>>> 80677 | 0 | 0 | f >> >>>>>>>> 80678 | 0 | 0 | f >> >>>>>>>> 80678 | 0 | 1 | f >> >>>>>>>> 80678 | 0 | 2 | f >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >> >>>>>>>> UPDATE 4 >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80678 | 80678 | 0 | t >> >>>>>>>> 80678 | 80678 | 1 | t >> >>>>>>>> 80678 | 80678 | 2 | t >> >>>>>>>> 80677 | 80678 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> (8 rows) >> >>>>>>>> >> >>>>>>>> test=# >> >>>>>>>> >> >>>>>>>> test=# -- Before finishing this, issue these from the first >> psql >> >>>>>>>> test=# SELECT t_xmin AS xmin, >> >>>>>>>> test-# t_xmax::text::int8 AS xmax, >> >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >> >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool >> AS >> >>>>>>>> is_combocid >> >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> >>>>>>>> test-# ORDER BY 2 DESC, 3; >> >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >> >>>>>>>> -------+-------+-----------+------------- >> >>>>>>>> 80678 | 80678 | 0 | t >> >>>>>>>> 80678 | 80678 | 1 | t >> >>>>>>>> 80678 | 80678 | 2 | t >> >>>>>>>> 80677 | 80678 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> 80678 | 0 | 3 | f >> >>>>>>>> (8 rows) >> >>>>>>>> >> >>>>>>>> test=# END; >> >>>>>>>> COMMIT >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Now consider the case we are trying to solve >> >>>>>>>> >> >>>>>>>> drop table tt1; >> >>>>>>>> create table tt1(f1 int); >> >>>>>>>> >> >>>>>>>> BEGIN; >> >>>>>>>> insert into tt1 values(1); >> >>>>>>>> declare c50 cursor for select * from tt1; -- should show one row >> >>>>>>>> only >> >>>>>>>> insert into tt1 values(2); >> >>>>>>>> fetch all from c50; >> >>>>>>>> COMMIT; >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Consider Data node 1 log >> >>>>>>>> >> >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >> >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >> >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >> >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >> >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >> >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >> >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL >> read >> >>>>>>>> committed READ WRITE] >> >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >> >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >> >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() >> >>>>>>>> FROM tt1] >> >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >> >>>>>>>> >> >>>>>>>> The cursor currently shows both inserted rows because command id >> at >> >>>>>>>> data node in >> >>>>>>>> step (j) is 0 >> >>>>>>>> step (k) is 1 & >> >>>>>>>> step (l) is 2 >> >>>>>>>> >> >>>>>>>> Where as we need command ids to be >> >>>>>>>> >> >>>>>>>> step (j) should be 0 >> >>>>>>>> step (k) should be 2 & >> >>>>>>>> step (l) should be 1 >> >>>>>>>> >> >>>>>>>> This will solve the cursor visibility problem. >> >>>>>>>> >> >>>>>>>> To implement this I suggest we send command IDs to data nodes >> from >> >>>>>>>> the coordinator >> >>>>>>>> like we send gxid. The only difference will be that we do not >> need >> >>>>>>>> to take command IDs >> >>>>>>>> from GTM since they are only valid with in the transaction. >> >>>>>>>> >> >>>>>>>> See this example >> >>>>>>>> >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> ------+------+------+------+---- >> >>>>>>>> (0 rows) >> >>>>>>>> >> >>>>>>>> test=# begin; >> >>>>>>>> BEGIN >> >>>>>>>> test=# insert into tt1 values(1); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> (1 row) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(2); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> (2 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(3); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> (3 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(4); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# end; >> >>>>>>>> COMMIT >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> (4 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(5); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >> >>>>>>>> (5 rows) >> >>>>>>>> >> >>>>>>>> test=# insert into tt1 values(6); >> >>>>>>>> INSERT 0 1 >> >>>>>>>> test=# >> >>>>>>>> test=# >> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >> >>>>>>>> xmin | xmax | cmin | cmax | f1 >> >>>>>>>> -------+------+------+------+---- >> >>>>>>>> 80615 | 0 | 0 | 0 | 1 >> >>>>>>>> 80615 | 0 | 1 | 1 | 2 >> >>>>>>>> 80615 | 0 | 2 | 2 | 3 >> >>>>>>>> 80615 | 0 | 3 | 3 | 4 >> >>>>>>>> 80616 | 0 | 0 | 0 | 5 >> >>>>>>>> 80617 | 0 | 0 | 0 | 6 >> >>>>>>>> (6 rows) >> >>>>>>>> >> >>>>>>>> Note that at the end of the multi-statement transaction the >> command >> >>>>>>>> id gets reset to zero. >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> Abbas >> >>>>>>>> Architect >> >>>>>>>> EnterpriseDB Corporation >> >>>>>>>> The Enterprise PostgreSQL Company >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> -- >> >>>>>>> Abbas >> >>>>>>> Architect >> >>>>>>> EnterpriseDB Corporation >> >>>>>>> The Enterprise PostgreSQL Company >> >>>>>>> >> >>>>>>> Phone: 92-334-5100153 >> >>>>>>> >> >>>>>>> Website: www.enterprisedb.com >> >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>>>>>> >> >>>>>>> This e-mail message (and any attachment) is intended for the use >> of >> >>>>>>> the individual or entity to whom it is addressed. This message >> >>>>>>> contains information from EnterpriseDB Corporation that may be >> >>>>>>> privileged, confidential, or exempt from disclosure under >> applicable >> >>>>>>> law. If you are not the intended recipient or authorized to >> receive >> >>>>>>> this for the intended recipient, any use, dissemination, >> >>>>>>> distribution, >> >>>>>>> retention, archiving, or copying of this communication is strictly >> >>>>>>> prohibited. If you have received this e-mail in error, please >> notify >> >>>>>>> the sender immediately by reply e-mail and delete this message. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> ------------------------------------------------------------------------------ >> >>>>>>> Live Security Virtual Conference >> >>>>>>> Exclusive live event will cover all the ways today's security and >> >>>>>>> threat landscape has changed and how IT managers can respond. >> >>>>>>> Discussions >> >>>>>>> will include endpoint security, mobile security and the latest in >> >>>>>>> malware >> >>>>>>> threats. >> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >>>>>>> _______________________________________________ >> >>>>>>> Postgres-xc-developers mailing list >> >>>>>>> Pos...@li... >> >>>>>>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> Michael Paquier >> >>>>>> http://michael.otacoo.com >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> -- >> >>>>> Abbas >> >>>>> Architect >> >>>>> EnterpriseDB Corporation >> >>>>> The Enterprise PostgreSQL Company >> >>>>> >> >>>>> Phone: 92-334-5100153 >> >>>>> >> >>>>> Website: www.enterprisedb.com >> >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>>>> >> >>>>> This e-mail message (and any attachment) is intended for the use of >> >>>>> the individual or entity to whom it is addressed. This message >> >>>>> contains information from EnterpriseDB Corporation that may be >> >>>>> privileged, confidential, or exempt from disclosure under applicable >> >>>>> law. If you are not the intended recipient or authorized to receive >> >>>>> this for the intended recipient, any use, dissemination, >> distribution, >> >>>>> retention, archiving, or copying of this communication is strictly >> >>>>> prohibited. If you have received this e-mail in error, please notify >> >>>>> the sender immediately by reply e-mail and delete this message. >> >>>>> >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>> Live Security Virtual Conference >> >>>>> Exclusive live event will cover all the ways today's security and >> >>>>> threat landscape has changed and how IT managers can respond. >> >>>>> Discussions >> >>>>> will include endpoint security, mobile security and the latest in >> >>>>> malware >> >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >>>>> _______________________________________________ >> >>>>> Postgres-xc-developers mailing list >> >>>>> Pos...@li... >> >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Best Wishes, >> >>>> Ashutosh Bapat >> >>>> EntepriseDB Corporation >> >>>> The Enterprise Postgres Company >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> -- >> >>> Abbas >> >>> Architect >> >>> EnterpriseDB Corporation >> >>> The Enterprise PostgreSQL Company >> >>> >> >>> Phone: 92-334-5100153 >> >>> >> >>> Website: www.enterprisedb.com >> >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>> >> >>> This e-mail message (and any attachment) is intended for the use of >> >>> the individual or entity to whom it is addressed. This message >> >>> contains information from EnterpriseDB Corporation that may be >> >>> privileged, confidential, or exempt from disclosure under applicable >> >>> law. If you are not the intended recipient or authorized to receive >> >>> this for the intended recipient, any use, dissemination, distribution, >> >>> retention, archiving, or copying of this communication is strictly >> >>> prohibited. If you have received this e-mail in error, please notify >> >>> the sender immediately by reply e-mail and delete this message. >> >> >> >> >> >> >> >> >> >> -- >> >> -- >> >> Abbas >> >> Architect >> >> EnterpriseDB Corporation >> >> The Enterprise PostgreSQL Company >> >> >> >> Phone: 92-334-5100153 >> >> >> >> Website: www.enterprisedb.com >> >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> >> >> This e-mail message (and any attachment) is intended for the use of >> >> the individual or entity to whom it is addressed. This message >> >> contains information from EnterpriseDB Corporation that may be >> >> privileged, confidential, or exempt from disclosure under applicable >> >> law. If you are not the intended recipient or authorized to receive >> >> this for the intended recipient, any use, dissemination, distribution, >> >> retention, archiving, or copying of this communication is strictly >> >> prohibited. If you have received this e-mail in error, please notify >> >> the sender immediately by reply e-mail and delete this message. >> > >> > >> > >> > >> > -- >> > -- >> > Abbas >> > Architect >> > EnterpriseDB Corporation >> > The Enterprise PostgreSQL Company >> > >> > Phone: 92-334-5100153 >> > >> > Website: www.enterprisedb.com >> > EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> > Follow us on Twitter: http://www.twitter.com/enterprisedb >> > >> > This e-mail message (and any attachment) is intended for the use of >> > the individual or entity to whom it is addressed. This message >> > contains information from EnterpriseDB Corporation that may be >> > privileged, confidential, or exempt from disclosure under applicable >> > law. If you are not the intended recipient or authorized to receive >> > this for the intended recipient, any use, dissemination, distribution, >> > retention, archiving, or copying of this communication is strictly >> > prohibited. If you have received this e-mail in error, please notify >> > the sender immediately by reply e-mail and delete this message. >> > >> > >> ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> Discussions >> > will include endpoint security, mobile security and the latest in >> malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@en...> - 2012-07-18 22:56:48
|
Hi, Attached please find the patch that solves cursor visibility problem as well as the inserting into child by selecting from the parent problem using the design discussed earlier. Test cases are added and regressions are fixed accordingly. Patch is ready for review. Thanks. On Tue, Jul 10, 2012 at 6:54 AM, Koichi Suzuki <koi...@gm...>wrote: > I basically agree on the idea. I also think Michael needs this kind > of extension to feedback trigger target rows. We should design > NoticeResponse extensible enough for various usage. > > Regards; > ---------- > Koichi Suzuki > > > 2012/7/10 Abbas Butt <abb...@en...>: > > Here is the update on the issue. > > > > It was decided that the changes done by the data nodes in the command id > > should be communicated back to the coordinator and that the coordinator > > should choose the largest of all the received values as the next command > id. > > > > It was suggested that we should check that a skipped value of command id > > should not create a problem for the next operations on the table. I have > > verified both by studying the code and by actually changing the function > > CommandCounterIncrement to increment the command id by 3 and running > > regression. It worked fine so a hole in command id is not a problem. > > > > Next it was suggested that we should use the mechanism currently in > place to > > send # of tuples affected by a statement to communicate the changed > command > > id to the coordinator. > > Please refer to this link in the documentation > > http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html > > Note that there is no such message format that exists in the current > over > > the wire protocol to communicate the # of tuples affected by a statement. > > The libpq functions that we might suspect do the same are PQntuples and > > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as > > PQcmdTuples extracts the # of tuples affected from the CommandComplete > 'C' > > message string. We cannot use these mechanisms for our purpose. > > > > I evaluated the use of NoticeResponse 'N' for sending changed command id > but > > the message format of NoticeResponse mandates the use of certain fields > > which will make our sent messages un-necessarily bulky and would consume > > network bandwidth for no reason. > > > > I therefore suggest the we use a new message for communicating XC > specific > > information from data node to coordinator. Currently we will use it for > > command id but we will design the message format flexible enough to > > accommodate future XC requirements. Whenever the data node increments > > command id we will send the information to the coordinator and > > handle_response function in execRemote.c would be changed to accommodate > new > > message. Since coordinators will never use the new message therefore the > > existing clients do not need to bother. > > > > Comments or suggestions are welcome. > > > > Regards > > > > > > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en...> > > wrote: > >> > >> While fixing the regression failures resulting from the changes done by > >> the patch I was able to fix all except this test case > >> > >> set enforce_two_phase_commit = off; > >> > >> CREATE TEMP TABLE users ( > >> id INT PRIMARY KEY, > >> name VARCHAR NOT NULL > >> ) DISTRIBUTE BY REPLICATION; > >> > >> INSERT INTO users VALUES (1, 'Jozko'); > >> INSERT INTO users VALUES (2, 'Ferko'); > >> INSERT INTO users VALUES (3, 'Samko'); > >> CREATE TEMP TABLE tasks ( > >> id INT PRIMARY KEY, > >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, > >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, > >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL > >> ) DISTRIBUTE BY REPLICATION; > >> > >> INSERT INTO tasks VALUES (1,1,NULL,NULL); > >> INSERT INTO tasks VALUES (2,2,2,NULL); > >> INSERT INTO tasks VALUES (3,3,3,3); > >> > >> BEGIN; > >> > >> UPDATE tasks set id=id WHERE id=2; > >> SELECT * FROM tasks; > >> > >> DELETE FROM users WHERE id = 2; > >> SELECT * FROM tasks; > >> > >> COMMIT; > >> > >> The obtained output from the last select statement is > >> > >> id | owner | worker | checked_by > >> ----+-------+--------+------------ > >> 1 | 1 | | > >> 3 | 3 | 3 | 3 > >> 2 | 2 | 2 | > >> (3 rows) > >> > >> where as the expected output is > >> > >> id | owner | worker | checked_by > >> ----+-------+--------+------------ > >> 1 | 1 | | > >> 3 | 3 | 3 | 3 > >> 2 | | | > >> (3 rows) > >> > >> Note that the owner and worker have been set to null due to "ON DELETE > SET > >> NULL". > >> > >> Here is the reason why this does not work properly. Consider the last > >> transaction > >> > >> BEGIN; > >> > >> UPDATE tasks set id=id WHERE id=2; > >> SELECT * FROM tasks; > >> > >> DELETE FROM users WHERE id = 2; > >> SELECT * FROM tasks; > >> > >> COMMIT; > >> > >> Here are the command id values the coordinator sends to the data node > >> > >> 0 for the first update that gets incremented to 1 because this is a DML > >> and needs to consume a command id > >> 1 for the first select that remains 1 since it is not required to be > >> consumed. > >> 1 for the delete statement that gets incremented to 2 because it is a > DML > >> and 2 for the last select. > >> > >> Now this is what happens on the data node > >> > >> When the data node receives the first update with command id 0, it > >> increments it once due to the update itself and once due to the update > run > >> because of "ON UPDATE CASCADE". Hence the command id at the end of > update on > >> data node is 2. > >> The first select comes to data node with command id 1, which is > incorrect. > >> The user's intention is to see data after update and its command id > should > >> be 2. > >> Now delete comes with command id 1, and data node increments it once due > >> to the delete itself and once due to the update run because of "ON > DELETE > >> SET NULL", hence the command id at the end of delete is 3. > >> Coordinator now sends last select with command id 2, which is again > >> incorrect since user's intention is to see data after delete and select > >> should have been sent to data node with command id 3 or 4. > >> > >> Every time data node increments command id due to any statements run > >> implicitly either because of the constraints or triggers, this scheme of > >> sending command ids to data node from coordinator to solve fetch > problems > >> would fail. > >> > >> Datanode can have a trigger e.g. inserting rows thrice on every single > >> insert and would increment command id on every insert. Therefore this > design > >> cannot work. > >> > >> Either we have to synchronize command ids between datanode and > coordinator > >> through GTM > >> OR > >> We will have to send the DECLARE CURSOR down to the datanode. In this > case > >> however we will not be able to send the cursor query as it is because > the > >> query might contain a join on two tables which exist on a disjoint set > of > >> data nodes. > >> > >> Comments or suggestions are welcome. > >> > >> > >> > >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt < > abb...@en...> > >> wrote: > >>> > >>> Thanks for your comments. > >>> > >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat > >>> <ash...@en...> wrote: > >>>> > >>>> Hi Abbas, > >>>> I have few comments to make > >>>> 1. With this patch there are two variables for having command Id, that > >>>> is going to cause confusion and will be a maintenance burden, might > be error > >>>> prone. Is it possible to use a single variable instead of two? > >>> > >>> > >>> Are you talking about receivedCommandId and currentCommandId? If yes, I > >>> would prefer not having a packet received from coordinator overwrite > the > >>> currentCommandId at data node, because I am not 100% sure about the > life > >>> time of currentCommandId, I might overwrite it before time. It would > be safe > >>> to let currentCommandId as is unless we are compelled to get the next > >>> command ID, and have the received command id take priority at that > time. > >>> > >>>> > >>>> Right now there is some code which is specific to cursors in your > patch. > >>>> If you can plug the coordinator command id somehow into > currentCommandId, > >>>> you won't need that code and any other code which needs coordinator > command > >>>> ID will be automatically taken care of. > >>> > >>> > >>> That code is required to solve a problem. Consider this case when a > >>> coordinator received this transaction > >>> > >>> > >>> BEGIN; > >>> insert into tt1 values(1); > >>> declare c50 cursor for select * from tt1; > >>> insert into tt1 values(2); > >>> fetch all from c50; > >>> COMMIT; > >>> > >>> While sending select to the data node in response to a fetch we need to > >>> know what was the command ID of the declare cursor statement and we > need to > >>> send that command ID to the data node for this particular fetch. This > is the > >>> main idea behind this solution. > >>> > >>> The first insert goes to the data node with command id 0, the second > >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When > >>> coordinator sees fetch it needs to send select to the data node with > command > >>> ID 1 rather than 3. > >>> > >>> > >>>> > >>>> 2. A non-transaction on coordinator can spawn tranasactions on > datanode > >>>> or subtransactions (if there is already a transaction running). Does > your > >>>> patch handle that case? > >>> > >>> > >>> No and it does not need to, because that case has no known problems > that > >>> we need to solve. I don't think my patch would impact any such case > but I > >>> will analyze any failures that I may get in regressions. > >>> > >>>> > >>>> Should we do more thorough research in the transaction management, > esp. > >>>> to see the impact of getting same command id for two commands on the > >>>> datanode? > >>> > >>> > >>> If we issue two commands with the same command ID then we will > definitely > >>> have visibility issues according to the rules I have already > explained. But > >>> we will not have two commands sent to the data node with same command > id. > >>> > >>>> > >>>> > >>>> > >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt > >>>> <abb...@en...> wrote: > >>>>> > >>>>> Hi Ashutosh, > >>>>> Here are the results with the val column, Thanks. > >>>>> > >>>>> test=# drop table mvcc_demo; > >>>>> DROP TABLE > >>>>> test=# > >>>>> test=# create table mvcc_demo (val int); > >>>>> CREATE TABLE > >>>>> test=# > >>>>> test=# TRUNCATE mvcc_demo; > >>>>> TRUNCATE TABLE > >>>>> test=# > >>>>> test=# BEGIN; > >>>>> BEGIN > >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that > >>>>> combo id would be different > >>>>> DELETE 0 > >>>>> test=# DELETE FROM mvcc_demo; > >>>>> DELETE 0 > >>>>> test=# DELETE FROM mvcc_demo; > >>>>> DELETE 0 > >>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>> INSERT 0 1 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+------+-----------+------------- > >>>>> 80689 | 0 | 3 | f > >>>>> 80689 | 0 | 4 | f > >>>>> 80689 | 0 | 5 | f > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80689 | 0 | 3 | 3 | 1 > >>>>> 80689 | 0 | 4 | 4 | 2 > >>>>> 80689 | 0 | 5 | 5 | 3 > >>>>> > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# DELETE FROM mvcc_demo; > >>>>> DELETE 3 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+-------+-----------+------------- > >>>>> 80689 | 80689 | 0 | t > >>>>> 80689 | 80689 | 1 | t > >>>>> 80689 | 80689 | 2 | t > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> ------+------+------+------+----- > >>>>> (0 rows) > >>>>> > >>>>> > >>>>> test=# > >>>>> test=# END; > >>>>> COMMIT > >>>>> test=# > >>>>> test=# > >>>>> test=# TRUNCATE mvcc_demo; > >>>>> TRUNCATE TABLE > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> test=# BEGIN; > >>>>> BEGIN > >>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>> INSERT 0 1 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+------+-----------+------------- > >>>>> 80693 | 0 | 0 | f > >>>>> 80693 | 0 | 1 | f > >>>>> 80693 | 0 | 2 | f > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80693 | 0 | 0 | 0 | 1 > >>>>> 80693 | 0 | 1 | 1 | 2 > >>>>> 80693 | 0 | 2 | 2 | 3 > >>>>> (3 rows) > >>>>> > >>>>> test=# > >>>>> test=# UPDATE mvcc_demo SET val = 10; > >>>>> > >>>>> UPDATE 3 > >>>>> test=# > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+-------+-----------+------------- > >>>>> 80693 | 80693 | 0 | t > >>>>> 80693 | 80693 | 1 | t > >>>>> 80693 | 80693 | 2 | t > >>>>> 80693 | 0 | 3 | f > >>>>> 80693 | 0 | 3 | f > >>>>> 80693 | 0 | 3 | f > >>>>> (6 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80693 | 0 | 3 | 3 | 10 > >>>>> 80693 | 0 | 3 | 3 | 10 > >>>>> 80693 | 0 | 3 | 3 | 10 > >>>>> (3 rows) > >>>>> > >>>>> > >>>>> test=# > >>>>> test=# END; > >>>>> COMMIT > >>>>> test=# > >>>>> test=# TRUNCATE mvcc_demo; > >>>>> TRUNCATE TABLE > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- From one psql issue > >>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>> INSERT 0 1 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+------+-----------+------------- > >>>>> 80699 | 0 | 0 | f > >>>>> (1 row) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80699 | 0 | 0 | 0 | 1 > >>>>> (1 row) > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> test=# -- From another issue > >>>>> test=# BEGIN; > >>>>> BEGIN > >>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>> INSERT 0 1 > >>>>> test=# INSERT INTO mvcc_demo VALUES (4); > >>>>> INSERT 0 1 > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+------+-----------+------------- > >>>>> 80699 | 0 | 0 | f > >>>>> 80700 | 0 | 0 | f > >>>>> 80700 | 0 | 1 | f > >>>>> 80700 | 0 | 2 | f > >>>>> (4 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80699 | 0 | 0 | 0 | 1 > >>>>> 80700 | 0 | 0 | 0 | 2 > >>>>> 80700 | 0 | 1 | 1 | 3 > >>>>> 80700 | 0 | 2 | 2 | 4 > >>>>> (4 rows) > >>>>> > >>>>> test=# > >>>>> test=# UPDATE mvcc_demo SET val = 10; > >>>>> > >>>>> UPDATE 4 > >>>>> test=# > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+-------+-----------+------------- > >>>>> 80700 | 80700 | 0 | t > >>>>> 80700 | 80700 | 1 | t > >>>>> 80700 | 80700 | 2 | t > >>>>> 80699 | 80700 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> (8 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+------+------+------+----- > >>>>> 80700 | 0 | 3 | 3 | 10 > >>>>> 80700 | 0 | 3 | 3 | 10 > >>>>> 80700 | 0 | 3 | 3 | 10 > >>>>> 80700 | 0 | 3 | 3 | 10 > >>>>> (4 rows) > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> test=# -- Before finishing this, issue these from the first psql > >>>>> test=# SELECT t_xmin AS xmin, > >>>>> test-# t_xmax::text::int8 AS xmax, > >>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>> is_combocid > >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>> test-# ORDER BY 2 DESC, 3; > >>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>> -------+-------+-----------+------------- > >>>>> 80700 | 80700 | 0 | t > >>>>> 80700 | 80700 | 1 | t > >>>>> 80700 | 80700 | 2 | t > >>>>> 80699 | 80700 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> 80700 | 0 | 3 | f > >>>>> (8 rows) > >>>>> > >>>>> test=# > >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > >>>>> xmin | xmax | cmin | cmax | val > >>>>> -------+-------+------+------+----- > >>>>> 80699 | 80700 | 3 | 3 | 1 > >>>>> (1 row) > >>>>> > >>>>> test=# end; > >>>>> COMMIT > >>>>> > >>>>> > >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier > >>>>> <mic...@gm...> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, > so > >>>>>> be sure to send it to remote Datanodes really only if necessary. > >>>>>> You should put more severe conditions blocking this function cid can > >>>>>> easily get incremented in Postgres. > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt > >>>>>> <abb...@en...> wrote: > >>>>>>> > >>>>>>> PFA a WIP patch implementing the design presented earlier. > >>>>>>> The patch is WIP because it still has and FIXME and it shows some > >>>>>>> regression failures that need to be fixed, but other than that it > confirms > >>>>>>> that the suggested design would work fine. The following test > cases now work > >>>>>>> fine > >>>>>>> > >>>>>>> drop table tt1; > >>>>>>> create table tt1(f1 int) distribute by replication; > >>>>>>> > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> insert into tt1 values(1); > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> insert into tt1 values(2); > >>>>>>> fetch all from c50; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> insert into tt1 values(1); > >>>>>>> > >>>>>>> insert into tt1 values(2); > >>>>>>> fetch all from c50; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> insert into tt1 values(1); > >>>>>>> insert into tt1 values(2); > >>>>>>> > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> insert into tt1 values(3); > >>>>>>> > >>>>>>> fetch all from c50; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> insert into tt1 values(1); > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> insert into tt1 values(2); > >>>>>>> declare c51 cursor for select * from tt1; > >>>>>>> insert into tt1 values(3); > >>>>>>> fetch all from c50; > >>>>>>> fetch all from c51; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> > >>>>>>> BEGIN; > >>>>>>> insert into tt1 values(1); > >>>>>>> declare c50 cursor for select * from tt1; > >>>>>>> declare c51 cursor for select * from tt1; > >>>>>>> insert into tt1 values(2); > >>>>>>> insert into tt1 values(3); > >>>>>>> fetch all from c50; > >>>>>>> fetch all from c51; > >>>>>>> COMMIT; > >>>>>>> truncate table tt1; > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt > >>>>>>> <abb...@en...> wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> In a multi-statement transaction each statement is given a command > >>>>>>>> identifier > >>>>>>>> starting from zero and incrementing for each statement. > >>>>>>>> These command indentifers are required for extra tracking because > >>>>>>>> each > >>>>>>>> statement has its own visibility rules with in the transaction. > >>>>>>>> For example, a cursor’s contents must remain unchanged even if > later > >>>>>>>> statements in the > >>>>>>>> same transaction modify rows. Such tracking is implemented using > >>>>>>>> system command id > >>>>>>>> columns cmin/cmax, which is internally actually is a single > column. > >>>>>>>> > >>>>>>>> cmin/cmax come into play in case of multi-statement transactions > >>>>>>>> only, > >>>>>>>> they are both zero otherwise. > >>>>>>>> > >>>>>>>> cmin "The command identifier of the statement within the inserting > >>>>>>>> transaction." > >>>>>>>> cmax "The command identifier of the statement within the deleting > >>>>>>>> transaction." > >>>>>>>> > >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) > >>>>>>>> > >>>>>>>> ( // A heap tuple is valid > >>>>>>>> "now" iff > >>>>>>>> Xmin == my-transaction && // inserted by the > current > >>>>>>>> transaction > >>>>>>>> Cmin < my-command && // before this command, > and > >>>>>>>> ( > >>>>>>>> Xmax is null || // the row has not been > >>>>>>>> deleted, or > >>>>>>>> ( > >>>>>>>> Xmax == my-transaction && // it was deleted by the > >>>>>>>> current transaction > >>>>>>>> Cmax >= my-command // but not before this > >>>>>>>> command, > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> || // or > >>>>>>>> ( > >>>>>>>> Xmin is committed && // the row was inserted > by > >>>>>>>> a committed transaction, and > >>>>>>>> ( > >>>>>>>> Xmax is null || // the row has not been > >>>>>>>> deleted, or > >>>>>>>> ( > >>>>>>>> Xmax == my-transaction && // the row is being > deleted > >>>>>>>> by this transaction > >>>>>>>> Cmax >= my-command) || // but it's not deleted > >>>>>>>> "yet", or > >>>>>>>> ( > >>>>>>>> Xmax != my-transaction && // the row was deleted by > >>>>>>>> another transaction > >>>>>>>> Xmax is not committed // that has not been > >>>>>>>> committed > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> ) > >>>>>>>> > >>>>>>>> Because cmin and cmax are internally a single system column, > >>>>>>>> it is therefore not possible to simply record the status of a row > >>>>>>>> that is created and expired in the same multi-statement > transaction. > >>>>>>>> For that reason, a special combo command id is created that > >>>>>>>> references > >>>>>>>> a local memory hash that contains the actual cmin and cmax values. > >>>>>>>> It means that if combo id is being used the number we are seeing > >>>>>>>> would not be the cmin or cmax it will be an index into a local > >>>>>>>> array that contains a structure with has the actual cmin and cmax > >>>>>>>> values. > >>>>>>>> > >>>>>>>> The following queries (taken mostly from > >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) > >>>>>>>> use the contrib module pageinspect, which allows > >>>>>>>> visibility of internal heap page structures and all stored rows, > >>>>>>>> including those not visible in the current snapshot. > >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) > >>>>>>>> > >>>>>>>> We are exploring 3 examples here: > >>>>>>>> 1) INSERT & DELETE in a single transaction > >>>>>>>> 2) INSERT & UPDATE in a single transaction > >>>>>>>> 3) INSERT from two different transactions & UPDATE from one > >>>>>>>> > >>>>>>>> test=# drop table mvcc_demo; > >>>>>>>> DROP TABLE > >>>>>>>> test=# > >>>>>>>> test=# create table mvcc_demo (val int); > >>>>>>>> CREATE TABLE > >>>>>>>> test=# > >>>>>>>> test=# TRUNCATE mvcc_demo; > >>>>>>>> TRUNCATE TABLE > >>>>>>>> test=# > >>>>>>>> test=# BEGIN; > >>>>>>>> BEGIN > >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that > >>>>>>>> combo id would be different > >>>>>>>> DELETE 0 > >>>>>>>> test=# DELETE FROM mvcc_demo; > >>>>>>>> DELETE 0 > >>>>>>>> test=# DELETE FROM mvcc_demo; > >>>>>>>> DELETE 0 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+------+-----------+------------- > >>>>>>>> 80685 | 0 | 3 | f > >>>>>>>> 80685 | 0 | 4 | f > >>>>>>>> 80685 | 0 | 5 | f > >>>>>>>> (3 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# DELETE FROM mvcc_demo; > >>>>>>>> DELETE 3 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+-------+-----------+------------- > >>>>>>>> 80685 | 80685 | 0 | t > >>>>>>>> 80685 | 80685 | 1 | t > >>>>>>>> 80685 | 80685 | 2 | t > >>>>>>>> (3 rows) > >>>>>>>> > >>>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax > >>>>>>>> they are actually > >>>>>>>> the indexes of the internal array already explained above. > >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 > >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 > >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# END; > >>>>>>>> COMMIT > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# TRUNCATE mvcc_demo; > >>>>>>>> TRUNCATE TABLE > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# BEGIN; > >>>>>>>> BEGIN > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+------+-----------+------------- > >>>>>>>> 80675 | 0 | 0 | f > >>>>>>>> 80675 | 0 | 1 | f > >>>>>>>> 80675 | 0 | 2 | f > >>>>>>>> (3 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; > >>>>>>>> UPDATE 3 > >>>>>>>> test=# > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+-------+-----------+------------- > >>>>>>>> 80675 | 80675 | 0 | t > >>>>>>>> 80675 | 80675 | 1 | t > >>>>>>>> 80675 | 80675 | 2 | t > >>>>>>>> 80675 | 0 | 3 | f > >>>>>>>> 80675 | 0 | 3 | f > >>>>>>>> 80675 | 0 | 3 | f > >>>>>>>> (6 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# END; > >>>>>>>> COMMIT > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# TRUNCATE mvcc_demo; > >>>>>>>> TRUNCATE TABLE > >>>>>>>> test=# > >>>>>>>> > >>>>>>>> -- From one psql issue > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool > AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+------+-----------+------------- > >>>>>>>> 80677 | 0 | 0 | f > >>>>>>>> (1 row) > >>>>>>>> > >>>>>>>> > >>>>>>>> test=# -- From another issue > >>>>>>>> test=# BEGIN; > >>>>>>>> BEGIN > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+------+-----------+------------- > >>>>>>>> 80677 | 0 | 0 | f > >>>>>>>> 80678 | 0 | 0 | f > >>>>>>>> 80678 | 0 | 1 | f > >>>>>>>> 80678 | 0 | 2 | f > >>>>>>>> (4 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; > >>>>>>>> UPDATE 4 > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+-------+-----------+------------- > >>>>>>>> 80678 | 80678 | 0 | t > >>>>>>>> 80678 | 80678 | 1 | t > >>>>>>>> 80678 | 80678 | 2 | t > >>>>>>>> 80677 | 80678 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> (8 rows) > >>>>>>>> > >>>>>>>> test=# > >>>>>>>> > >>>>>>>> test=# -- Before finishing this, issue these from the first > psql > >>>>>>>> test=# SELECT t_xmin AS xmin, > >>>>>>>> test-# t_xmax::text::int8 AS xmax, > >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, > >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool > AS > >>>>>>>> is_combocid > >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > >>>>>>>> test-# ORDER BY 2 DESC, 3; > >>>>>>>> xmin | xmax | cmin_cmax | is_combocid > >>>>>>>> -------+-------+-----------+------------- > >>>>>>>> 80678 | 80678 | 0 | t > >>>>>>>> 80678 | 80678 | 1 | t > >>>>>>>> 80678 | 80678 | 2 | t > >>>>>>>> 80677 | 80678 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> 80678 | 0 | 3 | f > >>>>>>>> (8 rows) > >>>>>>>> > >>>>>>>> test=# END; > >>>>>>>> COMMIT > >>>>>>>> > >>>>>>>> > >>>>>>>> Now consider the case we are trying to solve > >>>>>>>> > >>>>>>>> drop table tt1; > >>>>>>>> create table tt1(f1 int); > >>>>>>>> > >>>>>>>> BEGIN; > >>>>>>>> insert into tt1 values(1); > >>>>>>>> declare c50 cursor for select * from tt1; -- should show one row > >>>>>>>> only > >>>>>>>> insert into tt1 values(2); > >>>>>>>> fetch all from c50; > >>>>>>>> COMMIT; > >>>>>>>> > >>>>>>>> > >>>>>>>> Consider Data node 1 log > >>>>>>>> > >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL > read > >>>>>>>> committed READ WRITE] > >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] > >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] > >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] > >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL > read > >>>>>>>> committed READ WRITE] > >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] > >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] > >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] > >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL > read > >>>>>>>> committed READ WRITE] > >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] > >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] > >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() > >>>>>>>> FROM tt1] > >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] > >>>>>>>> > >>>>>>>> The cursor currently shows both inserted rows because command id > at > >>>>>>>> data node in > >>>>>>>> step (j) is 0 > >>>>>>>> step (k) is 1 & > >>>>>>>> step (l) is 2 > >>>>>>>> > >>>>>>>> Where as we need command ids to be > >>>>>>>> > >>>>>>>> step (j) should be 0 > >>>>>>>> step (k) should be 2 & > >>>>>>>> step (l) should be 1 > >>>>>>>> > >>>>>>>> This will solve the cursor visibility problem. > >>>>>>>> > >>>>>>>> To implement this I suggest we send command IDs to data nodes from > >>>>>>>> the coordinator > >>>>>>>> like we send gxid. The only difference will be that we do not need > >>>>>>>> to take command IDs > >>>>>>>> from GTM since they are only valid with in the transaction. > >>>>>>>> > >>>>>>>> See this example > >>>>>>>> > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> ------+------+------+------+---- > >>>>>>>> (0 rows) > >>>>>>>> > >>>>>>>> test=# begin; > >>>>>>>> BEGIN > >>>>>>>> test=# insert into tt1 values(1); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> (1 row) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(2); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> (2 rows) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(3); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> (3 rows) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(4); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> 80615 | 0 | 3 | 3 | 4 > >>>>>>>> (4 rows) > >>>>>>>> > >>>>>>>> test=# end; > >>>>>>>> COMMIT > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> 80615 | 0 | 3 | 3 | 4 > >>>>>>>> (4 rows) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(5); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> 80615 | 0 | 3 | 3 | 4 > >>>>>>>> 80616 | 0 | 0 | 0 | 5 > >>>>>>>> (5 rows) > >>>>>>>> > >>>>>>>> test=# insert into tt1 values(6); > >>>>>>>> INSERT 0 1 > >>>>>>>> test=# > >>>>>>>> test=# > >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; > >>>>>>>> xmin | xmax | cmin | cmax | f1 > >>>>>>>> -------+------+------+------+---- > >>>>>>>> 80615 | 0 | 0 | 0 | 1 > >>>>>>>> 80615 | 0 | 1 | 1 | 2 > >>>>>>>> 80615 | 0 | 2 | 2 | 3 > >>>>>>>> 80615 | 0 | 3 | 3 | 4 > >>>>>>>> 80616 | 0 | 0 | 0 | 5 > >>>>>>>> 80617 | 0 | 0 | 0 | 6 > >>>>>>>> (6 rows) > >>>>>>>> > >>>>>>>> Note that at the end of the multi-statement transaction the > command > >>>>>>>> id gets reset to zero. > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Abbas > >>>>>>>> Architect > >>>>>>>> EnterpriseDB Corporation > >>>>>>>> The Enterprise PostgreSQL Company > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> -- > >>>>>>> Abbas > >>>>>>> Architect > >>>>>>> EnterpriseDB Corporation > >>>>>>> The Enterprise PostgreSQL Company > >>>>>>> > >>>>>>> Phone: 92-334-5100153 > >>>>>>> > >>>>>>> Website: www.enterprisedb.com > >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ > >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb > >>>>>>> > >>>>>>> This e-mail message (and any attachment) is intended for the use of > >>>>>>> the individual or entity to whom it is addressed. This message > >>>>>>> contains information from EnterpriseDB Corporation that may be > >>>>>>> privileged, confidential, or exempt from disclosure under > applicable > >>>>>>> law. If you are not the intended recipient or authorized to receive > >>>>>>> this for the intended recipient, any use, dissemination, > >>>>>>> distribution, > >>>>>>> retention, archiving, or copying of this communication is strictly > >>>>>>> prohibited. If you have received this e-mail in error, please > notify > >>>>>>> the sender immediately by reply e-mail and delete this message. > >>>>>>> > >>>>>>> > >>>>>>> > ------------------------------------------------------------------------------ > >>>>>>> Live Security Virtual Conference > >>>>>>> Exclusive live event will cover all the ways today's security and > >>>>>>> threat landscape has changed and how IT managers can respond. > >>>>>>> Discussions > >>>>>>> will include endpoint security, mobile security and the latest in > >>>>>>> malware > >>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >>>>>>> _______________________________________________ > >>>>>>> Postgres-xc-developers mailing list > >>>>>>> Pos...@li... > >>>>>>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Michael Paquier > >>>>>> http://michael.otacoo.com > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> -- > >>>>> Abbas > >>>>> Architect > >>>>> EnterpriseDB Corporation > >>>>> The Enterprise PostgreSQL Company > >>>>> > >>>>> Phone: 92-334-5100153 > >>>>> > >>>>> Website: www.enterprisedb.com > >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ > >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb > >>>>> > >>>>> This e-mail message (and any attachment) is intended for the use of > >>>>> the individual or entity to whom it is addressed. This message > >>>>> contains information from EnterpriseDB Corporation that may be > >>>>> privileged, confidential, or exempt from disclosure under applicable > >>>>> law. If you are not the intended recipient or authorized to receive > >>>>> this for the intended recipient, any use, dissemination, > distribution, > >>>>> retention, archiving, or copying of this communication is strictly > >>>>> prohibited. If you have received this e-mail in error, please notify > >>>>> the sender immediately by reply e-mail and delete this message. > >>>>> > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>> Live Security Virtual Conference > >>>>> Exclusive live event will cover all the ways today's security and > >>>>> threat landscape has changed and how IT managers can respond. > >>>>> Discussions > >>>>> will include endpoint security, mobile security and the latest in > >>>>> malware > >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >>>>> _______________________________________________ > >>>>> Postgres-xc-developers mailing list > >>>>> Pos...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Best Wishes, > >>>> Ashutosh Bapat > >>>> EntepriseDB Corporation > >>>> The Enterprise Postgres Company > >>>> > >>> > >>> > >>> > >>> -- > >>> -- > >>> Abbas > >>> Architect > >>> EnterpriseDB Corporation > >>> The Enterprise PostgreSQL Company > >>> > >>> Phone: 92-334-5100153 > >>> > >>> Website: www.enterprisedb.com > >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ > >>> Follow us on Twitter: http://www.twitter.com/enterprisedb > >>> > >>> This e-mail message (and any attachment) is intended for the use of > >>> the individual or entity to whom it is addressed. This message > >>> contains information from EnterpriseDB Corporation that may be > >>> privileged, confidential, or exempt from disclosure under applicable > >>> law. If you are not the intended recipient or authorized to receive > >>> this for the intended recipient, any use, dissemination, distribution, > >>> retention, archiving, or copying of this communication is strictly > >>> prohibited. If you have received this e-mail in error, please notify > >>> the sender immediately by reply e-mail and delete this message. > >> > >> > >> > >> > >> -- > >> -- > >> Abbas > >> Architect > >> EnterpriseDB Corporation > >> The Enterprise PostgreSQL Company > >> > >> Phone: 92-334-5100153 > >> > >> Website: www.enterprisedb.com > >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ > >> Follow us on Twitter: http://www.twitter.com/enterprisedb > >> > >> This e-mail message (and any attachment) is intended for the use of > >> the individual or entity to whom it is addressed. This message > >> contains information from EnterpriseDB Corporation that may be > >> privileged, confidential, or exempt from disclosure under applicable > >> law. If you are not the intended recipient or authorized to receive > >> this for the intended recipient, any use, dissemination, distribution, > >> retention, archiving, or copying of this communication is strictly > >> prohibited. If you have received this e-mail in error, please notify > >> the sender immediately by reply e-mail and delete this message. > > > > > > > > > > -- > > -- > > Abbas > > Architect > > EnterpriseDB Corporation > > The Enterprise PostgreSQL Company > > > > Phone: 92-334-5100153 > > > > Website: www.enterprisedb.com > > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > > Follow us on Twitter: http://www.twitter.com/enterprisedb > > > > This e-mail message (and any attachment) is intended for the use of > > the individual or entity to whom it is addressed. This message > > contains information from EnterpriseDB Corporation that may be > > privileged, confidential, or exempt from disclosure under applicable > > law. If you are not the intended recipient or authorized to receive > > this for the intended recipient, any use, dissemination, distribution, > > retention, archiving, or copying of this communication is strictly > > prohibited. If you have received this e-mail in error, please notify > > the sender immediately by reply e-mail and delete this message. > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: 搜旁网 <se...@gr...> - 2012-07-18 08:16:23
|
看得见的朋友a.s_link{display:block; background-color:#cc1717; color:#fff; width:100px; height:28px; line-height:28px; text-align:center; text-decoration:none; }a.s_link:hover { background-color : #B11414; } 我住在北京天通苑小区,我在搜旁网认识很多住的非常近的人,我喜欢台球和爬山,希望你在这里也能找到见得着的朋友:) 接受邀请 如您不希望再次收到我们的邮件,请随时点击这里取消订阅。 |
From: Nikhil S. <ni...@st...> - 2012-07-18 08:02:06
|
>> Probably Nikhil misunderstood intent of CREATE NODE. CREATE NODE has >> nothing to do with redistribution. > > Yes, think so. >> Well, in some ways yes. But I was looking at it from the viewpoint of auto-redistribution of some tables based on the addition of a new datanode for example. Regards, Nikhils >> >>> >>> - For certain redistribution operations, we do NOT always need a >>> transaction block! Just take the example of a replicated table changed to a >>> distributed table. You just need to send a DELETE query to remote nodes to >>> remove only the tuples that do not satisfy a hash condition. This is one of >>> those things I am working on now. >> >> >> We will need transaction block to make the utility transactionally safe. >>> >>> >>> To my mind, each argument here makes necessary this feature in core. All >>> combined even strengthen my arguments. >>> >> >> I am not able to answer still serious problems - >> 1. Checking permissions to allow such a function to be run. Should we >> allow anyone who can run those DDLs/DMLs run the utility? > > Only the owner of the table. But this needs to be controlled internally. We > may have serious security issues lying under that. > >> >> 2. Cache invalidation - how do we invalidate plan caches after >> redistributing the data? Plan stores the list of nodes where the queries >> should be executed. Storing this list of nodes in plans is essential from >> JOIN reduction. > > Yes, and for a couple of other things. You will need to add externally some > runs of DEALLOCATE ALL, but this will impact all tables and not only the one > being redistributed. My latest patch only invalidated cache of the table > involved by redistribution. > >> >> >> Are there answers to these questions? >> >> But this approach has some merits - 1. It reduces the impact on the code. > > Yes, but redistribution needs to be a core operation. > >> >> 2. There are few things not so good in this approach, like disk and >> network usage, which needs to be optimized. By taking this approach, we >> provide an easy (albeit unoptimised) way of re-distributing the tables and >> can work on the final version of code changes for ALTER TABLE DISTRIBUTE >> afterwards, keeping longer slot for the work. > > In this case you do not need any code! You could also do a simple CREATE > TABLE AS to redistribution the table as you wish to a new table, drop the > old table, and rename the new table with the old name. This could also be > done with 1.0. > You should also have a look at my latest patch, it already includes the > maximum optimizations possible for replicated table redistribution, > particularly distrib.c. Just by looking at that you will see that a node > level control is more than necessary. > -- > Michael Paquier > http://michael.otacoo.com -- StormDB - http://www.stormdb.com The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-07-18 06:08:02
|
(Moving to dev...) On Wed, Jul 18, 2012 at 3:04 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Suzuki-san, > Thanks for this commit. Hope people configuring HA find it useful. > > It will also be helpful, to add documentation as to how to configure and > use this utility. How to specify which nodes it should monitor and if there > are any options in monitoring. If it allows various modes of notifying > failure, how to use those, etc. > Some docs have been written but Suzuki-san forgot to include them in the commit. He is just adding them now. > > On Wed, Jul 18, 2012 at 11:27 AM, Koichi Suzuki < > koi...@us...> wrote: > >> Project "Postgres-XC". >> >> The branch, master has been updated >> via f6bbc44a1afefd56fb7c7d54d7a921cc648d45c3 (commit) >> from 4371aadf993ff4fce458362b0de8d848bbe96e4a (commit) >> >> >> - Log ----------------------------------------------------------------- >> >> http://postgres-xc.git.sourceforge.net/git/gitweb.cgi?p=postgres-xc/postgres-xc;a=commitdiff;h=f6bbc44a1afefd56fb7c7d54d7a921cc648d45c3 >> >> commit f6bbc44a1afefd56fb7c7d54d7a921cc648d45c3 >> Author: Koichi Suzuki <koi...@gm...> >> Date: Wed Jul 18 14:59:24 2012 +0900 >> >> This commit adds pgxc_monitor utility used to check each Postgres-XC >> node is up and running. >> >> A contrib/pgxc_monitor/Makefile >> A contrib/pgxc_monitor/mcxt.c >> A contrib/pgxc_monitor/pgxc_monitor.c >> M doc-xc/src/sgml/contrib.sgmlin >> M doc-xc/src/sgml/filelist.sgmlin >> >> ----------------------------------------------------------------------- >> >> Summary of changes: >> contrib/{pgxc_clean => pgxc_monitor}/Makefile | 11 +- >> contrib/pgxc_monitor/mcxt.c | 77 +++++++++ >> contrib/pgxc_monitor/pgxc_monitor.c | 211 >> +++++++++++++++++++++++++ >> doc-xc/src/sgml/contrib.sgmlin | 1 + >> doc-xc/src/sgml/filelist.sgmlin | 3 + >> 5 files changed, 298 insertions(+), 5 deletions(-) >> copy contrib/{pgxc_clean => pgxc_monitor}/Makefile (78%) >> create mode 100644 contrib/pgxc_monitor/mcxt.c >> create mode 100644 contrib/pgxc_monitor/pgxc_monitor.c >> >> >> hooks/post-receive >> -- >> Postgres-XC >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-committers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-committers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers > > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-17 08:46:55
|
On Tue, Jul 17, 2012 at 5:35 PM, Ashutosh Bapat < ash...@en...> wrote: > Although it's a bug in 1.0, there will be very few (ideally 0) people > doing updates to catalogs using SQL. So, this isn't a candidate for > backpatching. If we hear about this bug from the field, we will add it. Why not... A bug is a bug, but feel free to do as you wish. -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2012-07-17 08:35:45
|
Although it's a bug in 1.0, there will be very few (ideally 0) people doing updates to catalogs using SQL. So, this isn't a candidate for backpatching. If we hear about this bug from the field, we will add it. On Tue, Jul 17, 2012 at 10:15 AM, Michael Paquier <mic...@gm... > wrote: > Hi, > > I haven't seen any problems with this patch. > This should also be backpatched to 1.0 stable. > > Regards, > > On Mon, Jul 16, 2012 at 3:10 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Hi All, >> Trying to update a catalog in XC, reports error "could not find relation >> for oid = ...". >> >> PFA the fix for the same. >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Michael Paquier > http://michael.otacoo.com > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Koichi S. <koi...@gm...> - 2012-07-17 05:22:19
|
I suppose global constraint requires similar additional feature as separate coordinator/datanode. There could be less overhead because now datanode has direct access to some global things, not via coordinator. I agree we need careful research and design, and goal of the gain. Regards; ---------- Koichi Suzuki 2012/7/17 Michael Paquier <mic...@gm...>: > > > On Tue, Jul 17, 2012 at 1:23 PM, Ashutosh Bapat > <ash...@en...> wrote: >> >> >> >> On Tue, Jul 17, 2012 at 8:10 AM, Andrei Martsinchyk >> <and...@gm...> wrote: >>> >>> It is great that you are going to merge Coordinator and Datanode. Mason >>> and I discussed this item but did not agreed upon, now I am getting >>> majority. We would be interested in participating the discussion. >> >> >> Before majority wins, I would give my vote against majority. ;) >> >> Please note that, I am not against doing coordinator and datanode merge, >> but may be the time is not right. We have quite a few things on our plate >> like constraints, HA inside XC, node addition/deletion, some SQL features. >> We might want to focus on those to make XC a good product feature-wise and >> stability-wise and then move on to the merge. >> >> I personally don't think coordinator and datanode merge is sure to give >> performance advantage. So, we may spend time merging these two only to find >> that it's not performing well, and thus backtrace wasting time. > > Just critically lowering inter-node data exchange and network load will > allow a good performance gain for sure :) > However I got the point about features we should do first like global > constraints. And what if global constraints are easier to do after we did > the merge? We don't know about that either. In this new architecture XC > would deal not with 2 layers of nodes, but only 1. > Globally it is worth to study the potential effort first. > -- > Michael Paquier > http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2012-07-17 05:19:43
|
Yes. We have lots of work issues toward 2.0 and unfortunately, I would say coordinator/datanode merge may not begin until 2.0 is out. I'd appreciate if anybody does some preliminary work for this, including a research of pros/cons of this approach, expected performance gain, if any, etc. Regards; ---------- Koichi Suzuki 2012/7/17 Ashutosh Bapat <ash...@en...>: > > > On Tue, Jul 17, 2012 at 8:10 AM, Andrei Martsinchyk > <and...@gm...> wrote: >> >> It is great that you are going to merge Coordinator and Datanode. Mason >> and I discussed this item but did not agreed upon, now I am getting >> majority. We would be interested in participating the discussion. > > > Before majority wins, I would give my vote against majority. ;) > > Please note that, I am not against doing coordinator and datanode merge, but > may be the time is not right. We have quite a few things on our plate like > constraints, HA inside XC, node addition/deletion, some SQL features. We > might want to focus on those to make XC a good product feature-wise and > stability-wise and then move on to the merge. > > I personally don't think coordinator and datanode merge is sure to give > performance advantage. So, we may spend time merging these two only to find > that it's not performing well, and thus backtrace wasting time. > >> >> Anyway, merging of Coordinator and Datanode and merging Proxy a differ in >> term of code changes. First basically involves pooler, executor and planner >> module changes, former needs a wrapper and integration into postmaster, >> probably some refactoring in gtm and gtm proxy modules. >> I think we can keep the proxy multithreaded, at least on the first >> integration stage. The pooler already linked with the pthreads stuff, as it >> is using libpq. >> Indeed, I think we would keep the socket communication between backend and >> pooler. I am not sure if Unix socket is any faster then TCP/IP socket, it >> would be possible to use Unix sockets as well. Later on, we would be able to >> use shared memory. If the proxy maintains a local copy of the GTM snapshot >> in shared memory it will be better if process make their local copies from >> that one, then send them around via sockets. >> >> >> 2012/7/17 Koichi Suzuki <koi...@gm...> >>> >>> 2012/7/16 Michael Paquier <mic...@gm...>: >>> > This merge idea looks nice. But I see 2 issues. >>> > 1) What is cruelly missing in the current design is the possibility to >>> > run a >>> > Postgres-XC node as a GTM-Proxy standalone. This means that this >>> > Postgres-XC >>> > node would only proxy messages to GTM and only that. It will need to >>> > reject >>> > other client applications. >>> > This would allow cascading of GTM Proxies and more flexibility in the >>> > system. >>> >>> Cascading GTM proxies is not supported yet and I'm not sure what gain >>> we have with it. I don't think it's reasonable to run >>> coordinator/node instance for this purpose anyway. >>> >>> > If you guys are able to add that, well we gain in code maintenance and >>> > keep >>> > the same architecture possible. >>> > 2) Performance? We will need to test that under really a heavy load >>> > before >>> > committing it. >>> >>> Okay. Only one reason I expect for performance is we can save >>> (local) socket communication between the backend and the proxy. >>> However, if we make proxy a separate process, just sharing the binary, >>> we could be suffered by another overhead (for example, pg_proc locks, >>> etc). >>> >>> Because of the difference in the process mode, integrating proxy with >>> coordinator/backend may need to rewrite the code. >>> >>> Regards; >>> --- >>> Koichi >>> >>> > -- >>> > Michael Paquier >>> > http://michael.otacoo.com >> >> >> >> >> -- >> Andrei Martsinchyk >> >> StormDB - http://www.stormdb.com >> The Database Cloud >> >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Michael P. <mic...@gm...> - 2012-07-17 04:45:19
|
Hi, I haven't seen any problems with this patch. This should also be backpatched to 1.0 stable. Regards, On Mon, Jul 16, 2012 at 3:10 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi All, > Trying to update a catalog in XC, reports error "could not find relation > for oid = ...". > > PFA the fix for the same. > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-17 04:37:44
|
On Tue, Jul 17, 2012 at 1:23 PM, Ashutosh Bapat < ash...@en...> wrote: > > > On Tue, Jul 17, 2012 at 8:10 AM, Andrei Martsinchyk < > and...@gm...> wrote: > >> It is great that you are going to merge Coordinator and Datanode. Mason >> and I discussed this item but did not agreed upon, now I am getting >> majority. We would be interested in participating the discussion. > > > Before majority wins, I would give my vote against majority. ;) > > Please note that, I am not against doing coordinator and datanode merge, > but may be the time is not right. We have quite a few things on our plate > like constraints, HA inside XC, node addition/deletion, some SQL features. > We might want to focus on those to make XC a good product feature-wise and > stability-wise and then move on to the merge. > > I personally don't think coordinator and datanode merge is sure to give > performance advantage. So, we may spend time merging these two only to find > that it's not performing well, and thus backtrace wasting time. > Just critically lowering inter-node data exchange and network load will allow a good performance gain for sure :) However I got the point about features we should do first like global constraints. And what if global constraints are easier to do after we did the merge? We don't know about that either. In this new architecture XC would deal not with 2 layers of nodes, but only 1. Globally it is worth to study the potential effort first. -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2012-07-17 04:28:34
|
Oh, I worded it the wrong way. Sorry. Here is better wording. Your mail said ---- In the last core team meeting, we agreed not to include general ALTER TABLE feature in CONCURRENT operation. ---- I didn't understand the purpose of word "CONCURRENT" in the current context. The feature that Michael is working on is non-concurrent ALTER TABLE (no other command can run with ALTER TABLE). I am not at all suggesting that we change it. I am saying that our design should be such that eventually we should be able to specify and execute ALTER TABLE subcommands like add column/delete column (which change the row layout) along-with redistribution commands. There is nothing going concurrent here. Hope that clarifies. On Tue, Jul 17, 2012 at 9:36 AM, Koichi Suzuki <koi...@gm...>wrote: > People don't like to stop application operation. It is very painful. > Current ALTER TABLE blocks application so we have to stop involved > applications for a while. CONCURRENT operation don't require this, > although it may take longer. > > This is very important feature. > > Regards; > ---------- > Koichi Suzuki > > > 2012/7/17 Ashutosh Bapat <ash...@en...>: > > Hi Suzuki-san, > > > > On Mon, Jul 16, 2012 at 8:09 PM, Koichi Suzuki < > koi...@gm...> > > wrote: > >> > >> 2012/7/16 Ashutosh Bapat <ash...@en...>: > >> > Finally I got off reviewing this patch. > >> > > >> > One of the problems, with this patch is extensibility. At some point > in > >> > future (and probably very near future), we should allow adding column > >> > (an > >> > example) and redistribution to be done at the same time, to reduce the > >> > total > >> > time of doing ALTER TABLE if it comes to adding column (an example) > and > >> > redistribute the table at the same time. Basically we should allow all > >> > the > >> > table rewriting to be done at the same time as the PostgreSQL ALTER > >> > TABLE > >> > allows to do. The method used here does not leave a room for such > >> > combined > >> > operation, as an extension in future. This means, that when it comes > to > >> > supporting the above said capability we have to rewrite the whole > thing > >> > (leaving may be transformation and node manipulation aside). That's > why, > >> > I > >> > would like to see a fix in ATRewriteTable, where we have sequence for > >> > every > >> > row 1 get row, 2. rewrite the row 3. write the new row. > >> > >> > >> I'm afraid this is too much for the current work. In the last core > >> team meeting, we agreed not to include general ALTER TABLE feature in > >> CONCURRENT operation. Although I think it's better to have such > >> extension ready in the current feature, we have much more carried over > >> beyond 2.0, it's also important to have features really work. > >> > > > > I don't understand the purpose behind concurrent operations? Do you mean > > specifying multiple ALTER TABLE subcommands in the same ALTER TABLE > command? > > OR do you mean online redistribution. I am presuming you mean the first > one. > > But still need to confirm. > > > >> > >> > > >> > Anyway, I have following comments for the patch itself, > >> > 1. Need better name for PgxcClassAlter(). > >> > >> Any idea? > >> > >> > 2. In ATController, why do you need to move ATCheckCmd below > ATPrepCmd? > >> > 3. Comments on line 2933 and 2941 can be worded as "Perform > >> > pre-catalog-update redistribution operations" and "Perform > >> > post-catalog-update redistribution operations" > >> > 4. Why can't redistribution be run in a transaction block? Does the > >> > redistribution run as a transaction? What happens if the server > crashes > >> > while redistribution is being done at various stages of its progress? > >> > >> Hmm, this could be some limitation. PostgreSQL's ALTER TABLE, unlike > >> MySQL and older version of Oracle, can run in a transaction block and > >> most users would like this feature too. > >> > >> > 5. Do you want to rename BuildDistribCommands() as > >> > BuildReDistribCommands()? > >> > 6. In function BuildDistribCommands(), What does variable new_oids > >> > signify? > >> > I think it's signifying the new node oids, if so please use name > >> > accordingly. > >> > 7. The names of function tree_build_entry() and its minions look > pretty > >> > generic, we need some prefix to these functions like pgxc_redist_ or > >> > something like that. BTW, what's the "tree" there indicate? There is > >> > nothing > >> > like tree that is being built, it's just a list of commands. > >> > 8. Why are you using repalloc in tree_add_single_command(), you can > >> > rather > >> > create a list and append to that list. > >> > 9. We don't need two separate functions Pre and Post - Update(), it > can > >> > be a > >> > single function which takes the Pre/Post as flag and runs the relevant > >> > commands. BTW, just PreUpdate does not show its real meaning, it > should > >> > be > >> > something like PreCatalogUpdate() or something like that. > >> > 10. Why every catalog change to pgxc_class invalidates the cache? We > >> > should > >> > do it only once for a given command. > >> > 11. The functions add_oid_list, delete_oid_list etc. are using oid > >> > arrays, > >> > then why use the suffice _list() in those functions? I do not like the > >> > frequence repalloc that happens in both these functions. Worst is the > >> > movement of array element in delete_node_list(). Any mistake here > would > >> > be > >> > disastrous. PostgreSQL code is very generous in using memory to keep > >> > things > >> > simple. You can use lists or bitmaps if you want to save the space, > but > >> > not > >> > repalloc. > >> > 12. Instead of a single file distrib directory, you can use locator > >> > directory with distrib.c file (better if you could use name like > >> > at_distrib > >> > or redistrib, since the files really belong to ALTER TABLE? > >> > 13. In makeRemoteCopyOptions you are using palloc0(), which sets all > the > >> > memory with 0, so bools automatically get value false and NULL lists > as > >> > NIL, > >> > you don't need to set those explicitly. > >> > 14. I do not see any tests related to sanity of views, rules and other > >> > objects which depend upon the table, after the table is redistributed > >> > using > >> > this method. May be it's a good idea to provide a prologue at the > >> > beginning > >> > of the testcase specifying how the testcase is laid out. > >> > > >> > > >> > On Mon, Jul 16, 2012 at 4:56 PM, Ashutosh Bapat > >> > <ash...@en...> wrote: > >> >> > >> >> > >> >>> In this case you do not need any code! You could also do a simple > >> >>> CREATE > >> >>> TABLE AS to redistribution the table as you wish to a new table, > drop > >> >>> the > >> >>> old table, and rename the new table with the old name. This could > also > >> >>> be > >> >>> done with 1.0. > >> >>> You should also have a look at my latest patch, it already includes > >> >>> the > >> >>> maximum optimizations possible for replicated table redistribution, > >> >>> particularly distrib.c. Just by looking at that you will see that a > >> >>> node > >> >>> level control is more than necessary. > >> >> > >> >> > >> >> This method would change the OID of the table and thus invalidate all > >> >> the > >> >> view definitions, rules etc. depending on this table. We don't want > >> >> that to > >> >> happen. The function would not change the OID, but would write the > data > >> >> in > >> >> the table's space itself. > >> >> > >> >>> > >> >>> -- > >> >>> Michael Paquier > >> >>> http://michael.otacoo.com > >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> Best Wishes, > >> >> Ashutosh Bapat > >> >> EntepriseDB Corporation > >> >> The Enterprise Postgres Company > >> >> > >> > > >> > > >> > > >> > -- > >> > Best Wishes, > >> > Ashutosh Bapat > >> > EntepriseDB Corporation > >> > The Enterprise Postgres Company > >> > > >> > > >> > > >> > > ------------------------------------------------------------------------------ > >> > Live Security Virtual Conference > >> > Exclusive live event will cover all the ways today's security and > >> > threat landscape has changed and how IT managers can respond. > >> > Discussions > >> > will include endpoint security, mobile security and the latest in > >> > malware > >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> > _______________________________________________ > >> > Postgres-xc-developers mailing list > >> > Pos...@li... > >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> > > > > > > > > > > > -- > > Best Wishes, > > Ashutosh Bapat > > EntepriseDB Corporation > > The Enterprise Postgres Company > > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2012-07-17 04:23:30
|
On Tue, Jul 17, 2012 at 8:10 AM, Andrei Martsinchyk < and...@gm...> wrote: > It is great that you are going to merge Coordinator and Datanode. Mason > and I discussed this item but did not agreed upon, now I am getting > majority. We would be interested in participating the discussion. Before majority wins, I would give my vote against majority. ;) Please note that, I am not against doing coordinator and datanode merge, but may be the time is not right. We have quite a few things on our plate like constraints, HA inside XC, node addition/deletion, some SQL features. We might want to focus on those to make XC a good product feature-wise and stability-wise and then move on to the merge. I personally don't think coordinator and datanode merge is sure to give performance advantage. So, we may spend time merging these two only to find that it's not performing well, and thus backtrace wasting time. > Anyway, merging of Coordinator and Datanode and merging Proxy a differ in > term of code changes. First basically involves pooler, executor and planner > module changes, former needs a wrapper and integration into postmaster, > probably some refactoring in gtm and gtm proxy modules. > I think we can keep the proxy multithreaded, at least on the first > integration stage. The pooler already linked with the pthreads stuff, as it > is using libpq. > Indeed, I think we would keep the socket communication between backend and > pooler. I am not sure if Unix socket is any faster then TCP/IP socket, it > would be possible to use Unix sockets as well. Later on, we would be able > to use shared memory. If the proxy maintains a local copy of the GTM > snapshot in shared memory it will be better if process make their local > copies from that one, then send them around via sockets. > > > 2012/7/17 Koichi Suzuki <koi...@gm...> > >> 2012/7/16 Michael Paquier <mic...@gm...>: >> > This merge idea looks nice. But I see 2 issues. >> > 1) What is cruelly missing in the current design is the possibility to >> run a >> > Postgres-XC node as a GTM-Proxy standalone. This means that this >> Postgres-XC >> > node would only proxy messages to GTM and only that. It will need to >> reject >> > other client applications. >> > This would allow cascading of GTM Proxies and more flexibility in the >> > system. >> >> Cascading GTM proxies is not supported yet and I'm not sure what gain >> we have with it. I don't think it's reasonable to run >> coordinator/node instance for this purpose anyway. >> >> > If you guys are able to add that, well we gain in code maintenance and >> keep >> > the same architecture possible. >> > 2) Performance? We will need to test that under really a heavy load >> before >> > committing it. >> >> Okay. Only one reason I expect for performance is we can save >> (local) socket communication between the backend and the proxy. >> However, if we make proxy a separate process, just sharing the binary, >> we could be suffered by another overhead (for example, pg_proc locks, >> etc). >> >> Because of the difference in the process mode, integrating proxy with >> coordinator/backend may need to rewrite the code. >> >> Regards; >> --- >> Koichi >> >> > -- >> > Michael Paquier >> > http://michael.otacoo.com >> > > > > -- > Andrei Martsinchyk > > StormDB - http://www.stormdb.com > The Database Cloud > > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Koichi S. <ko...@in...> - 2012-07-17 04:17:03
|
Because this is a bulk operation, applications may need full access to the table, I mean, read and write during the redistribution. May be providec by lower level of locking. People would like to see latest view of the table so running whole ALTER TABLE as a single transaction may not be satisfactory. Regards; --- Koichi Suzuki On Tue, 17 Jul 2012 13:08:46 +0900 Michael Paquier <mic...@gm...> wrote: > On Tue, Jul 17, 2012 at 1:06 PM, Koichi Suzuki <koi...@gm...>wrote: > > > People don't like to stop application operation. It is very painful. > > Current ALTER TABLE blocks application so we have to stop involved > > applications for a while. CONCURRENT operation don't require this, > > although it may take longer. > > > By blocking application, I think it means that an exclusive lock is taken > on the table being altered, so other sessions cannot access to it. > In this case CONCURRENT means that we do an ALTER TABLE by taking a lower > locking level and authorize other sessions to read and/or modify data > altered. > -- > Michael Paquier > http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-17 04:08:54
|
On Tue, Jul 17, 2012 at 1:06 PM, Koichi Suzuki <koi...@gm...>wrote: > People don't like to stop application operation. It is very painful. > Current ALTER TABLE blocks application so we have to stop involved > applications for a while. CONCURRENT operation don't require this, > although it may take longer. > By blocking application, I think it means that an exclusive lock is taken on the table being altered, so other sessions cannot access to it. In this case CONCURRENT means that we do an ALTER TABLE by taking a lower locking level and authorize other sessions to read and/or modify data altered. -- Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2012-07-17 04:08:07
|
People don't like to stop application operation. It is very painful. Current ALTER TABLE blocks application so we have to stop involved applications for a while. CONCURRENT operation don't require this, although it may take longer. Regards; ---------- Koichi Suzuki 2012/7/17 Ashutosh Bapat <ash...@en...>: > Hi Suzuki-san, > > On Mon, Jul 16, 2012 at 8:09 PM, Koichi Suzuki <koi...@gm...> > wrote: >> >> 2012/7/16 Ashutosh Bapat <ash...@en...>: >> > Finally I got off reviewing this patch. >> > >> > One of the problems, with this patch is extensibility. At some point in >> > future (and probably very near future), we should allow adding column >> > (an >> > example) and redistribution to be done at the same time, to reduce the >> > total >> > time of doing ALTER TABLE if it comes to adding column (an example) and >> > redistribute the table at the same time. Basically we should allow all >> > the >> > table rewriting to be done at the same time as the PostgreSQL ALTER >> > TABLE >> > allows to do. The method used here does not leave a room for such >> > combined >> > operation, as an extension in future. This means, that when it comes to >> > supporting the above said capability we have to rewrite the whole thing >> > (leaving may be transformation and node manipulation aside). That's why, >> > I >> > would like to see a fix in ATRewriteTable, where we have sequence for >> > every >> > row 1 get row, 2. rewrite the row 3. write the new row. >> >> >> I'm afraid this is too much for the current work. In the last core >> team meeting, we agreed not to include general ALTER TABLE feature in >> CONCURRENT operation. Although I think it's better to have such >> extension ready in the current feature, we have much more carried over >> beyond 2.0, it's also important to have features really work. >> > > I don't understand the purpose behind concurrent operations? Do you mean > specifying multiple ALTER TABLE subcommands in the same ALTER TABLE command? > OR do you mean online redistribution. I am presuming you mean the first one. > But still need to confirm. > >> >> > >> > Anyway, I have following comments for the patch itself, >> > 1. Need better name for PgxcClassAlter(). >> >> Any idea? >> >> > 2. In ATController, why do you need to move ATCheckCmd below ATPrepCmd? >> > 3. Comments on line 2933 and 2941 can be worded as "Perform >> > pre-catalog-update redistribution operations" and "Perform >> > post-catalog-update redistribution operations" >> > 4. Why can't redistribution be run in a transaction block? Does the >> > redistribution run as a transaction? What happens if the server crashes >> > while redistribution is being done at various stages of its progress? >> >> Hmm, this could be some limitation. PostgreSQL's ALTER TABLE, unlike >> MySQL and older version of Oracle, can run in a transaction block and >> most users would like this feature too. >> >> > 5. Do you want to rename BuildDistribCommands() as >> > BuildReDistribCommands()? >> > 6. In function BuildDistribCommands(), What does variable new_oids >> > signify? >> > I think it's signifying the new node oids, if so please use name >> > accordingly. >> > 7. The names of function tree_build_entry() and its minions look pretty >> > generic, we need some prefix to these functions like pgxc_redist_ or >> > something like that. BTW, what's the "tree" there indicate? There is >> > nothing >> > like tree that is being built, it's just a list of commands. >> > 8. Why are you using repalloc in tree_add_single_command(), you can >> > rather >> > create a list and append to that list. >> > 9. We don't need two separate functions Pre and Post - Update(), it can >> > be a >> > single function which takes the Pre/Post as flag and runs the relevant >> > commands. BTW, just PreUpdate does not show its real meaning, it should >> > be >> > something like PreCatalogUpdate() or something like that. >> > 10. Why every catalog change to pgxc_class invalidates the cache? We >> > should >> > do it only once for a given command. >> > 11. The functions add_oid_list, delete_oid_list etc. are using oid >> > arrays, >> > then why use the suffice _list() in those functions? I do not like the >> > frequence repalloc that happens in both these functions. Worst is the >> > movement of array element in delete_node_list(). Any mistake here would >> > be >> > disastrous. PostgreSQL code is very generous in using memory to keep >> > things >> > simple. You can use lists or bitmaps if you want to save the space, but >> > not >> > repalloc. >> > 12. Instead of a single file distrib directory, you can use locator >> > directory with distrib.c file (better if you could use name like >> > at_distrib >> > or redistrib, since the files really belong to ALTER TABLE? >> > 13. In makeRemoteCopyOptions you are using palloc0(), which sets all the >> > memory with 0, so bools automatically get value false and NULL lists as >> > NIL, >> > you don't need to set those explicitly. >> > 14. I do not see any tests related to sanity of views, rules and other >> > objects which depend upon the table, after the table is redistributed >> > using >> > this method. May be it's a good idea to provide a prologue at the >> > beginning >> > of the testcase specifying how the testcase is laid out. >> > >> > >> > On Mon, Jul 16, 2012 at 4:56 PM, Ashutosh Bapat >> > <ash...@en...> wrote: >> >> >> >> >> >>> In this case you do not need any code! You could also do a simple >> >>> CREATE >> >>> TABLE AS to redistribution the table as you wish to a new table, drop >> >>> the >> >>> old table, and rename the new table with the old name. This could also >> >>> be >> >>> done with 1.0. >> >>> You should also have a look at my latest patch, it already includes >> >>> the >> >>> maximum optimizations possible for replicated table redistribution, >> >>> particularly distrib.c. Just by looking at that you will see that a >> >>> node >> >>> level control is more than necessary. >> >> >> >> >> >> This method would change the OID of the table and thus invalidate all >> >> the >> >> view definitions, rules etc. depending on this table. We don't want >> >> that to >> >> happen. The function would not change the OID, but would write the data >> >> in >> >> the table's space itself. >> >> >> >>> >> >>> -- >> >>> Michael Paquier >> >>> http://michael.otacoo.com >> >> >> >> >> >> >> >> >> >> -- >> >> Best Wishes, >> >> Ashutosh Bapat >> >> EntepriseDB Corporation >> >> The Enterprise Postgres Company >> >> >> > >> > >> > >> > -- >> > Best Wishes, >> > Ashutosh Bapat >> > EntepriseDB Corporation >> > The Enterprise Postgres Company >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Koichi S. <koi...@gm...> - 2012-07-17 04:06:30
|
People don't like to stop application operation. It is very painful. Current ALTER TABLE blocks application so we have to stop involved applications for a while. CONCURRENT operation don't require this, although it may take longer. This is very important feature. Regards; ---------- Koichi Suzuki 2012/7/17 Ashutosh Bapat <ash...@en...>: > Hi Suzuki-san, > > On Mon, Jul 16, 2012 at 8:09 PM, Koichi Suzuki <koi...@gm...> > wrote: >> >> 2012/7/16 Ashutosh Bapat <ash...@en...>: >> > Finally I got off reviewing this patch. >> > >> > One of the problems, with this patch is extensibility. At some point in >> > future (and probably very near future), we should allow adding column >> > (an >> > example) and redistribution to be done at the same time, to reduce the >> > total >> > time of doing ALTER TABLE if it comes to adding column (an example) and >> > redistribute the table at the same time. Basically we should allow all >> > the >> > table rewriting to be done at the same time as the PostgreSQL ALTER >> > TABLE >> > allows to do. The method used here does not leave a room for such >> > combined >> > operation, as an extension in future. This means, that when it comes to >> > supporting the above said capability we have to rewrite the whole thing >> > (leaving may be transformation and node manipulation aside). That's why, >> > I >> > would like to see a fix in ATRewriteTable, where we have sequence for >> > every >> > row 1 get row, 2. rewrite the row 3. write the new row. >> >> >> I'm afraid this is too much for the current work. In the last core >> team meeting, we agreed not to include general ALTER TABLE feature in >> CONCURRENT operation. Although I think it's better to have such >> extension ready in the current feature, we have much more carried over >> beyond 2.0, it's also important to have features really work. >> > > I don't understand the purpose behind concurrent operations? Do you mean > specifying multiple ALTER TABLE subcommands in the same ALTER TABLE command? > OR do you mean online redistribution. I am presuming you mean the first one. > But still need to confirm. > >> >> > >> > Anyway, I have following comments for the patch itself, >> > 1. Need better name for PgxcClassAlter(). >> >> Any idea? >> >> > 2. In ATController, why do you need to move ATCheckCmd below ATPrepCmd? >> > 3. Comments on line 2933 and 2941 can be worded as "Perform >> > pre-catalog-update redistribution operations" and "Perform >> > post-catalog-update redistribution operations" >> > 4. Why can't redistribution be run in a transaction block? Does the >> > redistribution run as a transaction? What happens if the server crashes >> > while redistribution is being done at various stages of its progress? >> >> Hmm, this could be some limitation. PostgreSQL's ALTER TABLE, unlike >> MySQL and older version of Oracle, can run in a transaction block and >> most users would like this feature too. >> >> > 5. Do you want to rename BuildDistribCommands() as >> > BuildReDistribCommands()? >> > 6. In function BuildDistribCommands(), What does variable new_oids >> > signify? >> > I think it's signifying the new node oids, if so please use name >> > accordingly. >> > 7. The names of function tree_build_entry() and its minions look pretty >> > generic, we need some prefix to these functions like pgxc_redist_ or >> > something like that. BTW, what's the "tree" there indicate? There is >> > nothing >> > like tree that is being built, it's just a list of commands. >> > 8. Why are you using repalloc in tree_add_single_command(), you can >> > rather >> > create a list and append to that list. >> > 9. We don't need two separate functions Pre and Post - Update(), it can >> > be a >> > single function which takes the Pre/Post as flag and runs the relevant >> > commands. BTW, just PreUpdate does not show its real meaning, it should >> > be >> > something like PreCatalogUpdate() or something like that. >> > 10. Why every catalog change to pgxc_class invalidates the cache? We >> > should >> > do it only once for a given command. >> > 11. The functions add_oid_list, delete_oid_list etc. are using oid >> > arrays, >> > then why use the suffice _list() in those functions? I do not like the >> > frequence repalloc that happens in both these functions. Worst is the >> > movement of array element in delete_node_list(). Any mistake here would >> > be >> > disastrous. PostgreSQL code is very generous in using memory to keep >> > things >> > simple. You can use lists or bitmaps if you want to save the space, but >> > not >> > repalloc. >> > 12. Instead of a single file distrib directory, you can use locator >> > directory with distrib.c file (better if you could use name like >> > at_distrib >> > or redistrib, since the files really belong to ALTER TABLE? >> > 13. In makeRemoteCopyOptions you are using palloc0(), which sets all the >> > memory with 0, so bools automatically get value false and NULL lists as >> > NIL, >> > you don't need to set those explicitly. >> > 14. I do not see any tests related to sanity of views, rules and other >> > objects which depend upon the table, after the table is redistributed >> > using >> > this method. May be it's a good idea to provide a prologue at the >> > beginning >> > of the testcase specifying how the testcase is laid out. >> > >> > >> > On Mon, Jul 16, 2012 at 4:56 PM, Ashutosh Bapat >> > <ash...@en...> wrote: >> >> >> >> >> >>> In this case you do not need any code! You could also do a simple >> >>> CREATE >> >>> TABLE AS to redistribution the table as you wish to a new table, drop >> >>> the >> >>> old table, and rename the new table with the old name. This could also >> >>> be >> >>> done with 1.0. >> >>> You should also have a look at my latest patch, it already includes >> >>> the >> >>> maximum optimizations possible for replicated table redistribution, >> >>> particularly distrib.c. Just by looking at that you will see that a >> >>> node >> >>> level control is more than necessary. >> >> >> >> >> >> This method would change the OID of the table and thus invalidate all >> >> the >> >> view definitions, rules etc. depending on this table. We don't want >> >> that to >> >> happen. The function would not change the OID, but would write the data >> >> in >> >> the table's space itself. >> >> >> >>> >> >>> -- >> >>> Michael Paquier >> >>> http://michael.otacoo.com >> >> >> >> >> >> >> >> >> >> -- >> >> Best Wishes, >> >> Ashutosh Bapat >> >> EntepriseDB Corporation >> >> The Enterprise Postgres Company >> >> >> > >> > >> > >> > -- >> > Best Wishes, >> > Ashutosh Bapat >> > EntepriseDB Corporation >> > The Enterprise Postgres Company >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Ashutosh B. <ash...@en...> - 2012-07-17 04:03:21
|
Hi Suzuki-san, On Mon, Jul 16, 2012 at 8:09 PM, Koichi Suzuki <koi...@gm...>wrote: > 2012/7/16 Ashutosh Bapat <ash...@en...>: > > Finally I got off reviewing this patch. > > > > One of the problems, with this patch is extensibility. At some point in > > future (and probably very near future), we should allow adding column (an > > example) and redistribution to be done at the same time, to reduce the > total > > time of doing ALTER TABLE if it comes to adding column (an example) and > > redistribute the table at the same time. Basically we should allow all > the > > table rewriting to be done at the same time as the PostgreSQL ALTER TABLE > > allows to do. The method used here does not leave a room for such > combined > > operation, as an extension in future. This means, that when it comes to > > supporting the above said capability we have to rewrite the whole thing > > (leaving may be transformation and node manipulation aside). That's why, > I > > would like to see a fix in ATRewriteTable, where we have sequence for > every > > row 1 get row, 2. rewrite the row 3. write the new row. > > > I'm afraid this is too much for the current work. In the last core > team meeting, we agreed not to include general ALTER TABLE feature in > CONCURRENT operation. Although I think it's better to have such > extension ready in the current feature, we have much more carried over > beyond 2.0, it's also important to have features really work. > > I don't understand the purpose behind concurrent operations? Do you mean specifying multiple ALTER TABLE subcommands in the same ALTER TABLE command? OR do you mean online redistribution. I am presuming you mean the first one. But still need to confirm. > > > > Anyway, I have following comments for the patch itself, > > 1. Need better name for PgxcClassAlter(). > > Any idea? > > > 2. In ATController, why do you need to move ATCheckCmd below ATPrepCmd? > > 3. Comments on line 2933 and 2941 can be worded as "Perform > > pre-catalog-update redistribution operations" and "Perform > > post-catalog-update redistribution operations" > > 4. Why can't redistribution be run in a transaction block? Does the > > redistribution run as a transaction? What happens if the server crashes > > while redistribution is being done at various stages of its progress? > > Hmm, this could be some limitation. PostgreSQL's ALTER TABLE, unlike > MySQL and older version of Oracle, can run in a transaction block and > most users would like this feature too. > > > 5. Do you want to rename BuildDistribCommands() as > BuildReDistribCommands()? > > 6. In function BuildDistribCommands(), What does variable new_oids > signify? > > I think it's signifying the new node oids, if so please use name > > accordingly. > > 7. The names of function tree_build_entry() and its minions look pretty > > generic, we need some prefix to these functions like pgxc_redist_ or > > something like that. BTW, what's the "tree" there indicate? There is > nothing > > like tree that is being built, it's just a list of commands. > > 8. Why are you using repalloc in tree_add_single_command(), you can > rather > > create a list and append to that list. > > 9. We don't need two separate functions Pre and Post - Update(), it can > be a > > single function which takes the Pre/Post as flag and runs the relevant > > commands. BTW, just PreUpdate does not show its real meaning, it should > be > > something like PreCatalogUpdate() or something like that. > > 10. Why every catalog change to pgxc_class invalidates the cache? We > should > > do it only once for a given command. > > 11. The functions add_oid_list, delete_oid_list etc. are using oid > arrays, > > then why use the suffice _list() in those functions? I do not like the > > frequence repalloc that happens in both these functions. Worst is the > > movement of array element in delete_node_list(). Any mistake here would > be > > disastrous. PostgreSQL code is very generous in using memory to keep > things > > simple. You can use lists or bitmaps if you want to save the space, but > not > > repalloc. > > 12. Instead of a single file distrib directory, you can use locator > > directory with distrib.c file (better if you could use name like > at_distrib > > or redistrib, since the files really belong to ALTER TABLE? > > 13. In makeRemoteCopyOptions you are using palloc0(), which sets all the > > memory with 0, so bools automatically get value false and NULL lists as > NIL, > > you don't need to set those explicitly. > > 14. I do not see any tests related to sanity of views, rules and other > > objects which depend upon the table, after the table is redistributed > using > > this method. May be it's a good idea to provide a prologue at the > beginning > > of the testcase specifying how the testcase is laid out. > > > > > > On Mon, Jul 16, 2012 at 4:56 PM, Ashutosh Bapat > > <ash...@en...> wrote: > >> > >> > >>> In this case you do not need any code! You could also do a simple > CREATE > >>> TABLE AS to redistribution the table as you wish to a new table, drop > the > >>> old table, and rename the new table with the old name. This could also > be > >>> done with 1.0. > >>> You should also have a look at my latest patch, it already includes the > >>> maximum optimizations possible for replicated table redistribution, > >>> particularly distrib.c. Just by looking at that you will see that a > node > >>> level control is more than necessary. > >> > >> > >> This method would change the OID of the table and thus invalidate all > the > >> view definitions, rules etc. depending on this table. We don't want > that to > >> happen. The function would not change the OID, but would write the data > in > >> the table's space itself. > >> > >>> > >>> -- > >>> Michael Paquier > >>> http://michael.otacoo.com > >> > >> > >> > >> > >> -- > >> Best Wishes, > >> Ashutosh Bapat > >> EntepriseDB Corporation > >> The Enterprise Postgres Company > >> > > > > > > > > -- > > Best Wishes, > > Ashutosh Bapat > > EntepriseDB Corporation > > The Enterprise Postgres Company > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |