postgres-xc-developers Mailing List for Postgres-XC (Page 2)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> Hi all,
>
> Here is the fix I propose based on the idea I proposed in a previous mail.
> If a prepared transaction, partially committed, is aborted, this patch 
> gathers the handles to nodes where an error occurred and saves them on 
> GTM.
>
> The prepared transaction partially committed is kept alive on GTM, so 
> other transactions cannot see the partially committed results.
> To complete the commit of the prepared transaction partially 
> committed, it is necessary to issue a COMMIT PREPARED 'gid'.
> Once this command is issued, transaction will finish its commit properly.
>
> Mason, this solves the problem you saw when you made your tests.
> It also respects the rule that a 2PC transaction partially committed 
> has to be committed.
>
Just took a brief look so far.  Seems better.

I understand that recovery and HA is in development and things are being 
done to lay the groundwork and improve, and that with this patch we are 
not trying to yet handle any and every situation. What happens if the 
coordinator fails before it can update GTM though?

Also, I did a test and got this:

WARNING:  unexpected EOF on datanode connection
WARNING:  Connection to Datanode 1 has unexpected state 1 and will be 
dropped

ERROR:  Could not commit prepared transaction implicitely
server closed the connection unexpectedly
     This probably means the server terminated abnormally
     before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

#0  0x907afe42 in kill$UNIX2003 ()
#1  0x9082223a in raise ()
#2  0x9082e679 in abort ()
#3  0x003917ce in ExceptionalCondition (conditionName=0x433f6c 
"!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 
"FailedAssertion", fileName=0x433f50 "procarray.c", lineNumber=283) at 
assert.c:57
#4  0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, 
latestXid=1018) at procarray.c:283
#5  0x0005905c in AbortTransaction () at xact.c:2525
#6  0x00059a6e in AbortCurrentTransaction () at xact.c:3001
#7  0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094
#8  0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, 
username=0x1002fc8 "masonsharp") at postgres.c:3622
#9  0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607
#10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216
#11 0x002542b5 in ServerLoop () at postmaster.c:1445
#12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at 
postmaster.c:1098
#13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188

I did the same test as before. I killed a data node after it received a 
COMMIT PREPARED message.

I think we should be able to continue.

The good news is that I should not see partially committed data, which I 
do not.

But if I try and manually commit it from a new connection to the 
coordinator:

mds=# COMMIT PREPARED 'T1018';
ERROR:  Could not get GID data from GTM

Maybe GTM removed this info when the coordinator disconnected? (Or maybe 
implicit transactions are only associated with a certain connection?)

I can see the transaction on one data node, but not the other.

Ideally we would come up with a scheme where if the coordinator session 
does not notify GTM, we can somehow recover.  Maybe this is my fault- I 
believe I advocated avoiding the extra work for implicit 2PC in the name 
of performance. :-)

We can think about what to do in the short term, and how to handle in 
the long term.

In the short term, your approach may be good enough once debugged, since 
it is a relatively rare case.

Long term we could think about a thread that runs on GTM and wakes up 
every 30 or 60 seconds or so (configurable), collects implicit 
transactions from the nodes (extension to pg_prepared_xacts required?) 
and if it sees that the XID does not have an associated live connection, 
knows that something went awry.  It then sees if it committed on any of 
the nodes. If not, rollback all, if it did on at least one, commit on 
all. If one of the data nodes is down, it won't do anything, perhaps log 
a warning. This would avoid user intervention, and would be pretty cool. 
Some of this code you may already have been working on for recovery and 
we could reuse here.

Regards,

Mason

> Thanks,
>
> -- 
> Michael Paquier
> http://michaelpq.users.sourceforge.net
>

-- 
Mason Sharp
EnterpriseDB Corporation
The Enterprise Postgres Company

This e-mail message (and any attachment) is intended for the use of
the individual or entity to whom it is addressed. This message
contains information from EnterpriseDB Corporation that may be
privileged, confidential, or exempt from disclosure under applicable
law. If you are not the intended recipient or authorized to receive
this for the intended recipient, any use, dissemination, distribution,
retention, archiving, or copying of this communication is strictly
prohibited. If you have received this e-mail in error, please notify
the sender immediately by reply e-mail and delete this message.

2010	Jan	Feb	Mar	Apr (10)	May (17)	Jun (3)	Jul	Aug	Sep (8)	Oct (18)	Nov (51)	Dec (74)
2011	Jan (47)	Feb (44)	Mar (44)	Apr (102)	May (35)	Jun (25)	Jul (56)	Aug (69)	Sep (32)	Oct (37)	Nov (31)	Dec (16)
2012	Jan (34)	Feb (127)	Mar (218)	Apr (252)	May (80)	Jun (137)	Jul (205)	Aug (159)	Sep (35)	Oct (50)	Nov (82)	Dec (52)
2013	Jan (107)	Feb (159)	Mar (118)	Apr (163)	May (151)	Jun (89)	Jul (106)	Aug (177)	Sep (49)	Oct (63)	Nov (46)	Dec (7)
2014	Jan (65)	Feb (128)	Mar (40)	Apr (11)	May (4)	Jun (8)	Jul (16)	Aug (11)	Sep (4)	Oct (1)	Nov (5)	Dec (16)
2015	Jan (5)	Feb	Mar (2)	Apr (5)	May (4)	Jun (12)	Jul	Aug	Sep	Oct	Nov	Dec (4)
2019	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
			1 (11)	2 (7)	3 (6)	4
5	6 (1)	7 (2)	8 (2)	9 (6)	10 (2)	11
12 (1)	13 (6)	14 (5)	15 (4)	16 (5)	17 (4)	18
19	20 (1)	21 (3)	22 (2)	23 (1)	24	25
26	27 (5)	28	29	30	31

postgres-xc-developers Mailing List for Postgres-XC (Page 2)

postgres-xc-developers — Postgres-XC hackers and developers