You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
1
(3) |
2
|
3
(1) |
4
(1) |
5
|
6
(1) |
7
(1) |
8
(3) |
9
(3) |
10
(6) |
11
|
12
(1) |
13
(1) |
14
(3) |
15
|
16
(2) |
17
(16) |
18
|
19
|
20
(6) |
21
(1) |
22
(8) |
23
(18) |
24
(1) |
25
(3) |
26
(2) |
27
(14) |
28
(18) |
29
(14) |
|
|
|
From: Michael P. <mic...@gm...> - 2012-02-14 10:15:59
|
> > > > 3) I found an assertion error at portal cleanup with your current patch > for > > the test privileges. > > (gdb) bt > > #0 0x00007fad4fe85a75 in raise () from /lib/libc.so.6 > > #1 0x00007fad4fe895c0 in abort () from /lib/libc.so.6 > > #2 0x0000000000857b76 in ExceptionalCondition (conditionName=0xa44a90 > > "!(portal->cleanup == ((void *)0))", > > errorType=0xa4471c "FailedAssertion", fileName=0xa44710 > "portalmem.c", > > lineNumber=792) at assert.c:57 > > I think I commented out this assertion as part of the second patch. > May be you forgot to apply that. > Yeah perhaps it was the same. Let's be care about that. > > > As you mentionned before, this may be a PostgreSQL bug. > > 4) There are minor issues with tests guc, temp (functions using temporary > > objects in their expressions), > > Again, the is_temp flag is not set properly at few places and the > second patch fixes that. I think there is still one extra failure > regarding temp objects that I might have overlooked. > Like other things, is_temp is related to the transaction, perhaps it may be better to move it inside TransactionData? > > > 5) Why not chaging the error message by an assert in > GenerateBeginCommand? > > The sooner the better to detect bugs. > > Which error message you are referring to ? BTW, I renamed > GenerateBeginCommand to generate_begin_command because its a static > function (I know we are not very consistent), and I don't see any > error message there. > OK, I saw a comment in GenerateBeginCommand about replacing the error message in with an assert. I just had a look at the path though in this area, and not at the file changed, so I may have misunderstood smth. > > > 6) In AtEOXact_GlobalTxn, you can use something like > CommitPreparedTranGTM > > in gtm.c to abort 2 transaction ids at the same time :) > > Yeah, may be :-) BTW, apart from combining the messages, does it serve > any other purpose ? Do we need to send them at once for some > correctness purpose ? > CommitPreparedTranGTM is just here for performance purposes. It avoids to go twice to GTM. > > > 8) in pgxc_node_begin, we shouldn't really worry about non-shippable > > expressions that may be sent down to datanodes. I think it is more the > > planner responsability to evaluate that and not the remote node executor. > > The planner has already good evaluation APIs for that. It is then the > > responsability of the dba to define customized expressions as pushable > even > > if it should not. > > Do you mean the XXX comment I have written about the read-only vs > read-write transaction ? Yeah, pgxc_node_begin() should just use > whatever the caller has specified. I think the comment would be more > appropriately placed in do_query() or any other such functions. I just > wanted to add a caution about treating read-write statements as > read-only, thus possibly compromising database state. > OK, understood, it's also sufficient like this. > > > 9) It is a design thought, but basically write and read connections are > part > > of the current transaction, so why not putting them in TransactionData > > instead of RemoteXactState in execRemote.c. Just wondering... > > I tried to put the remote transaction handling code in execRemote.c > and the TransactionData structure is opaque to other modules. Also, > the nodes are tracked in execRemote.c and also used in the same > module. So it made sense to have them as static members in > execRemote.c > OK, it was just a thought. It makes sense also like this. > > > It is pretty sad that the error messages are not correctly returned back > to > > client even if transaction handling is cleaner and basics are in. > > This is not an issue related to your patch though. > > > > Yeah. Thats one thing I am wondering about too. Especially, when > different nodes report failures for different reasons at PREPARE or > COMMIT time, which error should we report to the client ? This is not > just an issue with transaction management, but where ever we have used > combiner, I wonder how do we report multiple failures to the client ? > Should we encapsulate that as some generic error message from the > coordinator or should we report the first error or all errors from all > the connections ? I think we need to take a call on this. > There are a lot of ways to do about that. But if we think that XC should be transparent to the application, we should return the first error message. We could decide different strategies depending on the combiner type. -- Michael Paquier http://michael.otacoo.com |
From: Pavan D. <pav...@gm...> - 2012-02-14 08:43:49
|
Thanks Michael for a quick review. I know you are busy with other things, so I appreciate this even more. On Tue, Feb 14, 2012 at 7:19 AM, Michael Paquier <mic...@gm...> wrote: > Hi Pavan, > > I noticed that your code is not completely realigned with current head, I > added with my last commit a flag in TransactionData to track if local > parameters need to be fired for current transaction after launching begin on > remote connections. Ok. I might not have pulled the very latest sources. Will do that and rebase the patch. > You indeed deleted a loooot of code which will make future merges even > easier. > > I also have a couple of questions/comments. > 1) Isn't it dangerous to remove the lock on vacuum analyze and they need to > be added to the local snapshot. > We saw some visibility issues in this area in the past and I am wondering if > this may raise issues, you are the autovacuum specialist though. Yes. I think I need to fix that. What I did not like about the old code was it was adding procs while assigning XIDs. Thats not the right way and I think I saw failures because of that while testing. We should follow the same logic as followed by other processes. We in fact don't even need to maintain a separate array. We can carve out an array from the main procArray. Will fix this. > 2) You may need to add the handling for local parameters with the flag > isLocalParameterUsed in TransactionData. This flag should be reset correctly > when a transaction is finished (AbortTransaction, CommitTransaction, > PrepareTransaction). Ok. Will look at it after pulling the latest code. > 3) I found an assertion error at portal cleanup with your current patch for > the test privileges. > (gdb) bt > #0 0x00007fad4fe85a75 in raise () from /lib/libc.so.6 > #1 0x00007fad4fe895c0 in abort () from /lib/libc.so.6 > #2 0x0000000000857b76 in ExceptionalCondition (conditionName=0xa44a90 > "!(portal->cleanup == ((void *)0))", > errorType=0xa4471c "FailedAssertion", fileName=0xa44710 "portalmem.c", > lineNumber=792) at assert.c:57 I think I commented out this assertion as part of the second patch. May be you forgot to apply that. > As you mentionned before, this may be a PostgreSQL bug. > 4) There are minor issues with tests guc, temp (functions using temporary > objects in their expressions), Again, the is_temp flag is not set properly at few places and the second patch fixes that. I think there is still one extra failure regarding temp objects that I might have overlooked. > 5) Why not chaging the error message by an assert in GenerateBeginCommand? > The sooner the better to detect bugs. Which error message you are referring to ? BTW, I renamed GenerateBeginCommand to generate_begin_command because its a static function (I know we are not very consistent), and I don't see any error message there. > 6) In AtEOXact_GlobalTxn, you can use something like CommitPreparedTranGTM > in gtm.c to abort 2 transaction ids at the same time :) Yeah, may be :-) BTW, apart from combining the messages, does it serve any other purpose ? Do we need to send them at once for some correctness purpose ? > 7) I saw that AtPrepare_Remote includes the local node, this is not always > necessary for explicit 2PC as you know... This should be modified when you > add support for that. Yeah. That would be reworked when I add explicit 2PC support. > 8) in pgxc_node_begin, we shouldn't really worry about non-shippable > expressions that may be sent down to datanodes. I think it is more the > planner responsability to evaluate that and not the remote node executor. > The planner has already good evaluation APIs for that. It is then the > responsability of the dba to define customized expressions as pushable even > if it should not. Do you mean the XXX comment I have written about the read-only vs read-write transaction ? Yeah, pgxc_node_begin() should just use whatever the caller has specified. I think the comment would be more appropriately placed in do_query() or any other such functions. I just wanted to add a caution about treating read-write statements as read-only, thus possibly compromising database state. > 9) It is a design thought, but basically write and read connections are part > of the current transaction, so why not putting them in TransactionData > instead of RemoteXactState in execRemote.c. Just wondering... I tried to put the remote transaction handling code in execRemote.c and the TransactionData structure is opaque to other modules. Also, the nodes are tracked in execRemote.c and also used in the same module. So it made sense to have them as static members in execRemote.c > 10) There are warnings at code compilation > Will look at them. Thanks, > It is pretty sad that the error messages are not correctly returned back to > client even if transaction handling is cleaner and basics are in. > This is not an issue related to your patch though. > Yeah. Thats one thing I am wondering about too. Especially, when different nodes report failures for different reasons at PREPARE or COMMIT time, which error should we report to the client ? This is not just an issue with transaction management, but where ever we have used combiner, I wonder how do we report multiple failures to the client ? Should we encapsulate that as some generic error message from the coordinator or should we report the first error or all errors from all the connections ? I think we need to take a call on this. Thanks, Pavan |
From: Michael P. <mic...@gm...> - 2012-02-14 01:49:21
|
Hi Pavan, I noticed that your code is not completely realigned with current head, I added with my last commit a flag in TransactionData to track if local parameters need to be fired for current transaction after launching begin on remote connections. You indeed deleted a loooot of code which will make future merges even easier. I also have a couple of questions/comments. 1) Isn't it dangerous to remove the lock on vacuum analyze and they need to be added to the local snapshot. We saw some visibility issues in this area in the past and I am wondering if this may raise issues, you are the autovacuum specialist though. 2) You may need to add the handling for local parameters with the flag isLocalParameterUsed in TransactionData. This flag should be reset correctly when a transaction is finished (AbortTransaction, CommitTransaction, PrepareTransaction). 3) I found an assertion error at portal cleanup with your current patch for the test privileges. (gdb) bt #0 0x00007fad4fe85a75 in raise () from /lib/libc.so.6 #1 0x00007fad4fe895c0 in abort () from /lib/libc.so.6 #2 0x0000000000857b76 in ExceptionalCondition (conditionName=0xa44a90 "!(portal->cleanup == ((void *)0))", errorType=0xa4471c "FailedAssertion", fileName=0xa44710 "portalmem.c", lineNumber=792) at assert.c:57 #3 0x0000000000881cae in AtCleanup_Portals () at portalmem.c:792 #4 0x00000000004b96c1 in CleanupTransaction () at xact.c:2720 #5 0x00000000004bb27c in AbortOutOfAnyTransaction () at xact.c:4173 #6 0x000000000086ab7f in ShutdownPostgres (code=0, arg=0) at postinit.c:975 #7 0x000000000072ecfc in shmem_exit (code=0) at ipc.c:221 #8 0x000000000072ebfe in proc_exit_prepare (code=0) at ipc.c:181 #9 0x000000000072eb65 in proc_exit (code=0) at ipc.c:96 #10 0x00000000007566fe in PostgresMain (argc=2, argv=0x2a52ee8, username=0x2a52c30 "michael") at postgres.c:4372 #11 0x00000000006fc3b2 in BackendRun (port=0x2a81e50) at postmaster.c:3763 #12 0x00000000006fba32 in BackendStartup (port=0x2a81e50) at postmaster.c:3448 #13 0x00000000006f897e in ServerLoop () at postmaster.c:1539 #14 0x00000000006f811f in PostmasterMain (argc=7, argv=0x2a4fba0) at postmaster.c:1200 #15 0x0000000000662d65 in main (argc=7, argv=0x2a4fba0) at main.c:199 As you mentionned before, this may be a PostgreSQL bug. 4) There are minor issues with tests guc, temp (functions using temporary objects in their expressions), 5) Why not chaging the error message by an assert in GenerateBeginCommand? The sooner the better to detect bugs. 6) In AtEOXact_GlobalTxn, you can use something like CommitPreparedTranGTM in gtm.c to abort 2 transaction ids at the same time :) 7) I saw that AtPrepare_Remote includes the local node, this is not always necessary for explicit 2PC as you know... This should be modified when you add support for that. 8) in pgxc_node_begin, we shouldn't really worry about non-shippable expressions that may be sent down to datanodes. I think it is more the planner responsability to evaluate that and not the remote node executor. The planner has already good evaluation APIs for that. It is then the responsability of the dba to define customized expressions as pushable even if it should not. 9) It is a design thought, but basically write and read connections are part of the current transaction, so why not putting them in TransactionData instead of RemoteXactState in execRemote.c. Just wondering... 10) There are warnings at code compilation It is pretty sad that the error messages are not correctly returned back to client even if transaction handling is cleaner and basics are in. This is not an issue related to your patch though. Regards, On Mon, Feb 13, 2012 at 10:28 PM, Pavan Deolasee <pav...@gm...>wrote: > Hi All, > > PFA which does first round of transaction management refactoring. Over > the time, the transaction management code at the coordinator had > become quite unreadable and error prone, so I made an attempt to clean > that up a bit. There are still some concerns marked with XXX or PGXC > TODO in the code, but I think the flow is now much better. > > We track the nodes involved in global transaction as and when we send > BEGIN commands to the node. The nodes are tracked as either read-only > participant or read-write participants. This is important to later > decide whether do perform a 2PC or a simple commit on the remote > nodes. We right now rely on the planner information to (and the patch > does not change that much) to know if a transaction is doing any write > activity or not. This can be problematic in the presence of volatile > functions that can change the database state. > > The decision to whether run a statement in auto-commit mode on the > data node or within a transaction block is also hard. We can run a > statement in auto-commit mode if and only if one data node is involved > in the transaction AND the statement will be sent to the data node > only once. Before this work, there were places where we were not > encapsulating statements in a transaction block possibly leading to > wrong results. After fixing this, there is a chance that we might see > some performance impact. > > There are many cleanups: Here is a list, but probably not complete: > > 1. We now have only one member (transactionId) to track local or > global transaction identifier. Since we should always be using global > identifier in XC, this reduces the risk of wrong IDs being assigned > 2. I removed the additional transaction state - TBLOCK_END_NOT_GTM. It > wasn't clear why we need this > 3. 2PC is now hooked into PrepareTransaction/CommitTransaction. > Similarly, rollbacks and part commits are handled in AbortTransaction > 4. The GTM transaction termination is handled in > Commit/AbortTransaction. That gives is a single entry point for > transaction management > 5. Instead of various flags, we now just track if the local node is > involved in the transaction or not. That helps us decide whether to > perform 2PC on the local and remote node. This can further be enhanced > by looking at some other existing globals that PostgreSQL maintains > 6. I have ripped off many other functions which are either not needed > now or probably needed for explicit 2PC. I will take another look at > them when I add back support for explicit 2PC > > There is some further work needed regarding error handling. For > example, the combiner logic is cool to collect results from multiple > sources, but we fail to track errors at individual nodes. This might > be important to drive the 2PC even more cleanly. Another piece to look > at carefully is the places where we have hooked the remote transaction > handling. While I did not want to run remote communication while > interrupts are disabled, we need to ensure that this does not leave > any gaps in transaction management. > > I have also attached a small patch that fixes some trivial bugs where > we were not setting the is_temp flag correctly at some places. I have > commented out an assertion which is really a PostgreSQL bug, but gets > highlighted when the explicit 2PC support is disabled. > > Please look at the patches and let me know your comments. > > Thanks, > Pavan > > > > -- > Pavan Deolasee > EnterpriseDB http://www.enterprisedb.com > > > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Michael Paquier http://michael.otacoo.com |