You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
1
(4) |
2
(4) |
3
|
4
|
5
(2) |
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
(4) |
16
(1) |
17
|
18
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
26
(1) |
27
|
28
(5) |
29
(7) |
30
(4) |
|
From: Koichi S. <koi...@gm...> - 2011-09-30 04:50:30
|
Are there any statement to start new session holding the connection? Writing in a bus from the airport and don't have reference though. --- Koichi Suzuki On 2011/09/30, at 13:08, Michael Paquier <mic...@gm...> wrote: > A new idea to solve this issue came to my mind; > Destroy the connection slot on pooler if temporary objects are on it. This will clean up the backends correctly I think. > This is perhaps the easier way to do, it is clean but may impact performance for applications using a lot of temporary objects as each session will close the connections to other datanodes to clean everything. > > On Fri, Sep 30, 2011 at 11:54 AM, Michael Paquier <mic...@gm...> wrote: > I think I found the origin of the problem. > When ending a session, a DISCARD query is automatically run from pooler to clean up connections before put them back to pool. > However, this query needs a transaction ID to commit normally in autocommit. But it cannot obtain it because pooler does not send down a transaction ID at session ending. > LOG: statement: DISCARD ALL; > DEBUG: Local snapshot is built, xmin: 10003, xmax: 10003, xcnt: 0, RecentGlobalXmin: 10003 > STATEMENT: DISCARD ALL; > LOG: Falling back to local Xid. Was = 0, now is = 10003 > STATEMENT: DISCARD ALL; > DEBUG: Record transaction commit 10003 > > I am thinking about the following solution: > Adding a new session parameter that can force backends of a session to get GXID from GTM to ensure that commit ID is unique in the cluster. > Attached patch implemented that but it does not look to work yet. > > Any thoughts? > > -- > Michael Paquier > http://michael.otacoo.com > > > > -- > Michael Paquier > http://michael.otacoo.com > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |
From: Michael P. <mic...@gm...> - 2011-09-30 04:08:24
|
A new idea to solve this issue came to my mind; Destroy the connection slot on pooler if temporary objects are on it. This will clean up the backends correctly I think. This is perhaps the easier way to do, it is clean but may impact performance for applications using a lot of temporary objects as each session will close the connections to other datanodes to clean everything. On Fri, Sep 30, 2011 at 11:54 AM, Michael Paquier <mic...@gm... > wrote: > I think I found the origin of the problem. > When ending a session, a DISCARD query is automatically run from pooler to > clean up connections before put them back to pool. > However, this query needs a transaction ID to commit normally in > autocommit. But it cannot obtain it because pooler does not send down a > transaction ID at session ending. > LOG: statement: DISCARD ALL; > DEBUG: Local snapshot is built, xmin: 10003, xmax: 10003, xcnt: 0, > RecentGlobalXmin: 10003 > STATEMENT: DISCARD ALL; > LOG: Falling back to local Xid. Was = 0, now is = 10003 > STATEMENT: DISCARD ALL; > DEBUG: Record transaction commit 10003 > > I am thinking about the following solution: > Adding a new session parameter that can force backends of a session to get > GXID from GTM to ensure that commit ID is unique in the cluster. > Attached patch implemented that but it does not look to work yet. > > Any thoughts? > > -- > Michael Paquier > http://michael.otacoo.com > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-09-30 02:54:55
|
I think I found the origin of the problem. When ending a session, a DISCARD query is automatically run from pooler to clean up connections before put them back to pool. However, this query needs a transaction ID to commit normally in autocommit. But it cannot obtain it because pooler does not send down a transaction ID at session ending. LOG: statement: DISCARD ALL; DEBUG: Local snapshot is built, xmin: 10003, xmax: 10003, xcnt: 0, RecentGlobalXmin: 10003 STATEMENT: DISCARD ALL; LOG: Falling back to local Xid. Was = 0, now is = 10003 STATEMENT: DISCARD ALL; DEBUG: Record transaction commit 10003 I am thinking about the following solution: Adding a new session parameter that can force backends of a session to get GXID from GTM to ensure that commit ID is unique in the cluster. Attached patch implemented that but it does not look to work yet. Any thoughts? -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-09-30 01:11:24
|
I have been able to isolate that the error could happen in multiple places, but its origin looks to be commit prepared. After a couple of tests, I found that commit prepared fails with a clog issue: #2 0x00000000008516da in ExceptionalCondition (conditionName=0x8c4ff8 "!(curval == 0 || (curval == 0x03 && status != 0x00) || curval == status)", errorType=0x8c4ec7 "FailedAssertion", fileName=0x8c4ec0 "clog.c", lineNumber=358) at assert.c:57 #3 0x00000000004b4be9 in TransactionIdSetStatusBit (xid=20844, status=1, lsn=..., slotno=0) at clog.c:355 #4 0x00000000004b4a64 in TransactionIdSetPageStatus (xid=20844, nsubxids=0, subxids=0x2ca1440, status=1, lsn=..., pageno=0) at clog.c:309 #5 0x00000000004b47c3 in TransactionIdSetTreeStatus (xid=20844, nsubxids=0, subxids=0x2ca1440, status=1, lsn=...) at clog.c:182 #6 0x00000000004b563d in TransactionIdCommitTree (xid=20844, nxids=0, xids=0x2ca1440) at transam.c:266 #7 0x00000000004d9e4c in RecordTransactionCommitPrepared (xid=20844, nchildren=0, children=0x2ca1440, nrels=0, rels=0x2ca1440, ninvalmsgs=2, invalmsgs=0x2ca1440, initfileinval=0 '\000') at twophase.c:2043 #8 0x00000000004d8713 in FinishPreparedTransaction (gid=0x2d5fa50 "T20844", isCommit=1 '\001') at twophase.c:1308 #9 0x00000000007555ab in standard_ProcessUtility (parsetree=0x2d5fa70, queryString=0x2d5f058 "COMMIT PREPARED 'T20844'", params=0x0, isTopLevel=1 '\001', dest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at utility.c:530 #10 0x00000000007550ee in ProcessUtility (parsetree=0x2d5fa70, queryString=0x2d5f058 "COMMIT PREPARED 'T20844'", params=0x0, isTopLevel=1 '\001', dest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at utility.c:354 #11 0x0000000000753f30 in PortalRunUtility (portal=0x2ca3c48, utilityStmt=0x2d5fa70, isTopLevel=1 '\001', dest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at pquery.c:1218 #12 0x00000000007541c1 in PortalRunMulti (portal=0x2ca3c48, isTopLevel=1 '\001', dest=0x2d5fdf8, altdest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at pquery.c:1362 #13 0x0000000000753641 in PortalRun (portal=0x2ca3c48, count=9223372036854775807, isTopLevel=1 '\001', dest=0x2d5fdf8, altdest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at pquery.c:843 #14 0x000000000074d017 in exec_simple_query (query_string=0x2d5f058 "COMMIT PREPARED 'T20844'") at postgres.c:1088 #15 0x00000000007514c2 in PostgresMain (argc=2, argv=0x2c85c80, username=0x2c85c00 "michael") at postgres.c:4105 #16 0x00000000006f791b in BackendRun (port=0x2cb4f50) at postmaster.c:3786 #17 0x00000000006f6f79 in BackendStartup (port=0x2cb4f50) at postmaster.c:3466 #18 0x00000000006f3e0a in ServerLoop () at postmaster.c:1530 #19 0x00000000006f35ab in PostmasterMain (argc=7, argv=0x2c82b60) at postmaster.c:1191 #20 0x000000000065efa9 in main (argc=7, argv=0x2c82b60) at main.c:199 The real issue looks to be here the commit tree that acts weirdly at commit prepared. After the 1st crash, there is an additional behavior. Datanode servers usually restart but enter in this inconsistent state and servers stop abruptly at recovery: #0 0x00007f728abd9a75 in raise () from /lib/libc.so.6 #1 0x00007f728abdd5c0 in abort () from /lib/libc.so.6 #2 0x00000000008516da in ExceptionalCondition (conditionName=0x8c4ff8 "!(curval == 0 || (curval == 0x03 && status != 0x00) || curval == status)", errorType=0x8c4ec7 "FailedAssertion", fileName=0x8c4ec0 "clog.c", lineNumber=358) at assert.c:57 #3 0x00000000004b4be9 in TransactionIdSetStatusBit (xid=20844, status=1, lsn=..., slotno=0) at clog.c:355 #4 0x00000000004b4a64 in TransactionIdSetPageStatus (xid=20844, nsubxids=0, subxids=0x2cc4ef8, status=1, lsn=..., pageno=0) at clog.c:309 #5 0x00000000004b47c3 in TransactionIdSetTreeStatus (xid=20844, nsubxids=0, subxids=0x2cc4ef8, status=1, lsn=...) at clog.c:182 #6 0x00000000004b563d in TransactionIdCommitTree (xid=20844, nxids=0, xids=0x2cc4ef8) at transam.c:266 #7 0x00000000004bbb8f in xact_redo_commit (xlrec=0x2cc4ed8, xid=20844, lsn=...) at xact.c:5074 #8 0x00000000004bc038 in xact_redo (lsn=..., record=0x2cc4eb0) at xact.c:5275 #9 0x00000000004c9e72 in StartupXLOG () at xlog.c:6665 #10 0x00000000004d02ff in StartupProcessMain () at xlog.c:10069 #11 0x00000000004f87c3 in AuxiliaryProcessMain (argc=2, argv=0x7fff51668760) at bootstrap.c:434 #12 0x00000000006f7f7a in StartChildProcess (type=StartupProcess) at postmaster.c:4684 #13 0x00000000006f6b39 in PostmasterStateMachine () at postmaster.c:3275 #14 0x00000000006f5c7a in reaper (postgres_signal_arg=17) at postmaster.c:2726 #15 <signal handler called> #16 0x00007f728ac84fd3 in select () from /lib/libc.so.6 #17 0x00000000006f3cd9 in ServerLoop () at postmaster.c:1490 #18 0x00000000006f35ab in PostmasterMain (argc=7, argv=0x2c90b60) at postmaster.c:1191 #19 0x000000000065efa9 in main (argc=7, argv=0x2c90b60) at main.c:199 This analysis is in progress but I have an idea of the origin. -- Michael Paquier http://michael.otacoo.com |