You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
1
(11) |
2
(7) |
3
(6) |
4
|
5
|
6
(1) |
7
(2) |
8
(2) |
9
(6) |
10
(2) |
11
|
12
(1) |
13
(6) |
14
(5) |
15
(4) |
16
(5) |
17
(4) |
18
|
19
|
20
(1) |
21
(3) |
22
(2) |
23
(1) |
24
|
25
|
26
|
27
(5) |
28
|
29
|
30
|
31
|
|
From: Michael P. <mic...@gm...> - 2010-12-22 02:12:43
|
On Wed, Dec 22, 2010 at 10:23 AM, Mason Sharp <mas...@en...>wrote: > After "make clean; make", things look better. > > Thanks to take the time to check that. > I found another issue though. Still, you can go ahead and commit this since > it is close, in order to make merging easier. > > I'll do it, thanks. > If the coordinator tries to commit the prepared transactions, if it sends > commit prepared to one of the nodes, then is killed before it can send to > the other, if I restart the coordinator, I see the data from one of the > nodes only (GTM closed the transcation), which is not atomic. The second > data node is still alive and was the entire time. > That is true, if coordinator crashes, GTM closes all the backends of transactions that it considers as open. In the case of implicit COMMIT, even if we prepare/commit on the nodes, it is still seen as open on GTM. > > I fear we may have to treat implicit transactions similar to explicit > transactions. (BTW, do we handle explicit properly for these similar cases, > too?) If we stick with performance short cuts it is hard to be reliably > atomic. (Again, I will take the blame for trying to speed things up. > Perhaps we can have it as a configuration option if people have a lot of > implicit 2PC going on and understand the risks.) > Yeah I think so. A GUC parameter would make the deal, but I'd like to discuss more about that before deciding anything. > Anyway, the transaction would remain open, but it would have to be resolved > somehow. > > If we had a "transaction clean up" thread in GTM, it could note the > transaction information and periodically try and connect to the registered > nodes and resolve according to the rules we have talked about. (Again, some > of this code could be in some of the recovery tools you are writing, too). > The nice thing about doing something like this is we can automate things as > much as possible and not require DBA intervention; if a non-GTM component > goes down and comes up again, things will resolve by themselves. I suppose > if it is GTM itself that went down, once it rebuilds state properly, this > same mechanism could be called at the end of GTM recovery and resolve the > outstanding issues. > That it more or less what we are planning to do with the utility that will have to check the remaining 2PC transaction after a Coordinator crash. This utility would be kicked by the monitoring agent when noticing a Coordinator crash. This feature needs two things: 1) fix for EXECUTE DIRECT 2) extension of 2PC table (patch already written but not realigned with latest 2PC code) I think we need to walk through every step in the commit sequence and kill > an involved process and verify that we have a consistent view of the > database afterward, and that we have the ability/tools to resolve it. > > This code requires careful testing. > That's true, this code could lead easily to unexpected issues by playing with 2PC. -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-22 01:23:35
|
On 12/21/10 3:33 AM, Michael Paquier wrote: > > > Could you give me more details about this crash? > After "make clean; make", things look better. I found another issue though. Still, you can go ahead and commit this since it is close, in order to make merging easier. If the coordinator tries to commit the prepared transactions, if it sends commit prepared to one of the nodes, then is killed before it can send to the other, if I restart the coordinator, I see the data from one of the nodes only (GTM closed the transcation), which is not atomic. The second data node is still alive and was the entire time. I fear we may have to treat implicit transactions similar to explicit transactions. (BTW, do we handle explicit properly for these similar cases, too?) If we stick with performance short cuts it is hard to be reliably atomic. (Again, I will take the blame for trying to speed things up. Perhaps we can have it as a configuration option if people have a lot of implicit 2PC going on and understand the risks.) Anyway, the transaction would remain open, but it would have to be resolved somehow. If we had a "transaction clean up" thread in GTM, it could note the transaction information and periodically try and connect to the registered nodes and resolve according to the rules we have talked about. (Again, some of this code could be in some of the recovery tools you are writing, too). The nice thing about doing something like this is we can automate things as much as possible and not require DBA intervention; if a non-GTM component goes down and comes up again, things will resolve by themselves. I suppose if it is GTM itself that went down, once it rebuilds state properly, this same mechanism could be called at the end of GTM recovery and resolve the outstanding issues. I think we need to walk through every step in the commit sequence and kill an involved process and verify that we have a consistent view of the database afterward, and that we have the ability/tools to resolve it. This code requires careful testing. Thanks, Mason > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |