You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
1
(4) |
2
(4) |
3
|
4
|
5
(2) |
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
(4) |
16
(1) |
17
|
18
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
26
(1) |
27
|
28
(5) |
29
(7) |
30
(4) |
|
From: Koichi S. <koi...@gm...> - 2011-09-30 04:50:30
|
Are there any statement to start new session holding the connection? Writing in a bus from the airport and don't have reference though. --- Koichi Suzuki On 2011/09/30, at 13:08, Michael Paquier <mic...@gm...> wrote: > A new idea to solve this issue came to my mind; > Destroy the connection slot on pooler if temporary objects are on it. This will clean up the backends correctly I think. > This is perhaps the easier way to do, it is clean but may impact performance for applications using a lot of temporary objects as each session will close the connections to other datanodes to clean everything. > > On Fri, Sep 30, 2011 at 11:54 AM, Michael Paquier <mic...@gm...> wrote: > I think I found the origin of the problem. > When ending a session, a DISCARD query is automatically run from pooler to clean up connections before put them back to pool. > However, this query needs a transaction ID to commit normally in autocommit. But it cannot obtain it because pooler does not send down a transaction ID at session ending. > LOG: statement: DISCARD ALL; > DEBUG: Local snapshot is built, xmin: 10003, xmax: 10003, xcnt: 0, RecentGlobalXmin: 10003 > STATEMENT: DISCARD ALL; > LOG: Falling back to local Xid. Was = 0, now is = 10003 > STATEMENT: DISCARD ALL; > DEBUG: Record transaction commit 10003 > > I am thinking about the following solution: > Adding a new session parameter that can force backends of a session to get GXID from GTM to ensure that commit ID is unique in the cluster. > Attached patch implemented that but it does not look to work yet. > > Any thoughts? > > -- > Michael Paquier > http://michael.otacoo.com > > > > -- > Michael Paquier > http://michael.otacoo.com > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |
From: Michael P. <mic...@gm...> - 2011-09-30 04:08:24
|
A new idea to solve this issue came to my mind; Destroy the connection slot on pooler if temporary objects are on it. This will clean up the backends correctly I think. This is perhaps the easier way to do, it is clean but may impact performance for applications using a lot of temporary objects as each session will close the connections to other datanodes to clean everything. On Fri, Sep 30, 2011 at 11:54 AM, Michael Paquier <mic...@gm... > wrote: > I think I found the origin of the problem. > When ending a session, a DISCARD query is automatically run from pooler to > clean up connections before put them back to pool. > However, this query needs a transaction ID to commit normally in > autocommit. But it cannot obtain it because pooler does not send down a > transaction ID at session ending. > LOG: statement: DISCARD ALL; > DEBUG: Local snapshot is built, xmin: 10003, xmax: 10003, xcnt: 0, > RecentGlobalXmin: 10003 > STATEMENT: DISCARD ALL; > LOG: Falling back to local Xid. Was = 0, now is = 10003 > STATEMENT: DISCARD ALL; > DEBUG: Record transaction commit 10003 > > I am thinking about the following solution: > Adding a new session parameter that can force backends of a session to get > GXID from GTM to ensure that commit ID is unique in the cluster. > Attached patch implemented that but it does not look to work yet. > > Any thoughts? > > -- > Michael Paquier > http://michael.otacoo.com > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-09-30 02:54:55
|
I think I found the origin of the problem. When ending a session, a DISCARD query is automatically run from pooler to clean up connections before put them back to pool. However, this query needs a transaction ID to commit normally in autocommit. But it cannot obtain it because pooler does not send down a transaction ID at session ending. LOG: statement: DISCARD ALL; DEBUG: Local snapshot is built, xmin: 10003, xmax: 10003, xcnt: 0, RecentGlobalXmin: 10003 STATEMENT: DISCARD ALL; LOG: Falling back to local Xid. Was = 0, now is = 10003 STATEMENT: DISCARD ALL; DEBUG: Record transaction commit 10003 I am thinking about the following solution: Adding a new session parameter that can force backends of a session to get GXID from GTM to ensure that commit ID is unique in the cluster. Attached patch implemented that but it does not look to work yet. Any thoughts? -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-09-30 01:11:24
|
I have been able to isolate that the error could happen in multiple places, but its origin looks to be commit prepared. After a couple of tests, I found that commit prepared fails with a clog issue: #2 0x00000000008516da in ExceptionalCondition (conditionName=0x8c4ff8 "!(curval == 0 || (curval == 0x03 && status != 0x00) || curval == status)", errorType=0x8c4ec7 "FailedAssertion", fileName=0x8c4ec0 "clog.c", lineNumber=358) at assert.c:57 #3 0x00000000004b4be9 in TransactionIdSetStatusBit (xid=20844, status=1, lsn=..., slotno=0) at clog.c:355 #4 0x00000000004b4a64 in TransactionIdSetPageStatus (xid=20844, nsubxids=0, subxids=0x2ca1440, status=1, lsn=..., pageno=0) at clog.c:309 #5 0x00000000004b47c3 in TransactionIdSetTreeStatus (xid=20844, nsubxids=0, subxids=0x2ca1440, status=1, lsn=...) at clog.c:182 #6 0x00000000004b563d in TransactionIdCommitTree (xid=20844, nxids=0, xids=0x2ca1440) at transam.c:266 #7 0x00000000004d9e4c in RecordTransactionCommitPrepared (xid=20844, nchildren=0, children=0x2ca1440, nrels=0, rels=0x2ca1440, ninvalmsgs=2, invalmsgs=0x2ca1440, initfileinval=0 '\000') at twophase.c:2043 #8 0x00000000004d8713 in FinishPreparedTransaction (gid=0x2d5fa50 "T20844", isCommit=1 '\001') at twophase.c:1308 #9 0x00000000007555ab in standard_ProcessUtility (parsetree=0x2d5fa70, queryString=0x2d5f058 "COMMIT PREPARED 'T20844'", params=0x0, isTopLevel=1 '\001', dest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at utility.c:530 #10 0x00000000007550ee in ProcessUtility (parsetree=0x2d5fa70, queryString=0x2d5f058 "COMMIT PREPARED 'T20844'", params=0x0, isTopLevel=1 '\001', dest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at utility.c:354 #11 0x0000000000753f30 in PortalRunUtility (portal=0x2ca3c48, utilityStmt=0x2d5fa70, isTopLevel=1 '\001', dest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at pquery.c:1218 #12 0x00000000007541c1 in PortalRunMulti (portal=0x2ca3c48, isTopLevel=1 '\001', dest=0x2d5fdf8, altdest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at pquery.c:1362 #13 0x0000000000753641 in PortalRun (portal=0x2ca3c48, count=9223372036854775807, isTopLevel=1 '\001', dest=0x2d5fdf8, altdest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at pquery.c:843 #14 0x000000000074d017 in exec_simple_query (query_string=0x2d5f058 "COMMIT PREPARED 'T20844'") at postgres.c:1088 #15 0x00000000007514c2 in PostgresMain (argc=2, argv=0x2c85c80, username=0x2c85c00 "michael") at postgres.c:4105 #16 0x00000000006f791b in BackendRun (port=0x2cb4f50) at postmaster.c:3786 #17 0x00000000006f6f79 in BackendStartup (port=0x2cb4f50) at postmaster.c:3466 #18 0x00000000006f3e0a in ServerLoop () at postmaster.c:1530 #19 0x00000000006f35ab in PostmasterMain (argc=7, argv=0x2c82b60) at postmaster.c:1191 #20 0x000000000065efa9 in main (argc=7, argv=0x2c82b60) at main.c:199 The real issue looks to be here the commit tree that acts weirdly at commit prepared. After the 1st crash, there is an additional behavior. Datanode servers usually restart but enter in this inconsistent state and servers stop abruptly at recovery: #0 0x00007f728abd9a75 in raise () from /lib/libc.so.6 #1 0x00007f728abdd5c0 in abort () from /lib/libc.so.6 #2 0x00000000008516da in ExceptionalCondition (conditionName=0x8c4ff8 "!(curval == 0 || (curval == 0x03 && status != 0x00) || curval == status)", errorType=0x8c4ec7 "FailedAssertion", fileName=0x8c4ec0 "clog.c", lineNumber=358) at assert.c:57 #3 0x00000000004b4be9 in TransactionIdSetStatusBit (xid=20844, status=1, lsn=..., slotno=0) at clog.c:355 #4 0x00000000004b4a64 in TransactionIdSetPageStatus (xid=20844, nsubxids=0, subxids=0x2cc4ef8, status=1, lsn=..., pageno=0) at clog.c:309 #5 0x00000000004b47c3 in TransactionIdSetTreeStatus (xid=20844, nsubxids=0, subxids=0x2cc4ef8, status=1, lsn=...) at clog.c:182 #6 0x00000000004b563d in TransactionIdCommitTree (xid=20844, nxids=0, xids=0x2cc4ef8) at transam.c:266 #7 0x00000000004bbb8f in xact_redo_commit (xlrec=0x2cc4ed8, xid=20844, lsn=...) at xact.c:5074 #8 0x00000000004bc038 in xact_redo (lsn=..., record=0x2cc4eb0) at xact.c:5275 #9 0x00000000004c9e72 in StartupXLOG () at xlog.c:6665 #10 0x00000000004d02ff in StartupProcessMain () at xlog.c:10069 #11 0x00000000004f87c3 in AuxiliaryProcessMain (argc=2, argv=0x7fff51668760) at bootstrap.c:434 #12 0x00000000006f7f7a in StartChildProcess (type=StartupProcess) at postmaster.c:4684 #13 0x00000000006f6b39 in PostmasterStateMachine () at postmaster.c:3275 #14 0x00000000006f5c7a in reaper (postgres_signal_arg=17) at postmaster.c:2726 #15 <signal handler called> #16 0x00007f728ac84fd3 in select () from /lib/libc.so.6 #17 0x00000000006f3cd9 in ServerLoop () at postmaster.c:1490 #18 0x00000000006f35ab in PostmasterMain (argc=7, argv=0x2c90b60) at postmaster.c:1191 #19 0x000000000065efa9 in main (argc=7, argv=0x2c90b60) at main.c:199 This analysis is in progress but I have an idea of the origin. -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-09-29 23:29:09
|
On Thu, Sep 29, 2011 at 2:31 PM, Ashutosh Bapat < ash...@en...> wrote: > If we kind of know the area where the problems are, it will help to fix the > bug, so that regressions are crash free. I will need to depend upon the > regression a lot for the cleanup. Is it possible to fix the problem soon? To be honest I am not sure. I would first need to find the origin of the problem and I am not really sure it is that easy. Let me have a shot on it though. -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2011-09-29 05:31:32
|
If we kind of know the area where the problems are, it will help to fix the bug, so that regressions are crash free. I will need to depend upon the regression a lot for the cleanup. Is it possible to fix the problem soon? On Thu, Sep 29, 2011 at 9:00 AM, Michael Paquier <mic...@gm...>wrote: > On Thu, Sep 29, 2011 at 12:22 PM, Pavan Deolasee < > pav...@en...> wrote: > >> On Thu, Sep 29, 2011 at 8:00 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> On Thu, Sep 29, 2011 at 11:25 AM, Pavan Deolasee < >>> pav...@en...> wrote: >>> >>>> >>>> Could this be because the way we save and restore the GTM info ? I have >>>> seen issues because of that, especially if we fail to shutdown everything >>>> properly. >>>> >>> This is indeed possible. Now snapshot data from GTM is saved with malloc >>> on Datanodes, and we do not use any *safe* palloc mechanism. >>> >> >> No, you got me wrong. I was talking about the mechanism to save the GTM >> state in a file when GTM is shutdown. We then restore from the saved >> information at restart. That sometimes cause problem, especially if we have >> reinitialized the cluster. But I don't think make installcheck does that, so >> may be this is not the issue. >> > OK, there may be issues related that. But I am also able to reproduce the > problem with the 1st regression on a clean cluster from time to time. > > Michael > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2011-09-29 04:32:07
|
Hi all, I am preparing a sub-release based on branch 0.9.5 stable. Compared to 0.9.5, this release contains some fix regarding performance, and it includes all the commits done in postgresql 9.0 stable up to now. Regressions and performance are not impacted at all, so I will commit that in 0.9.5 stable branch if there are no objections. Regards, -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-09-29 03:31:07
|
On Thu, Sep 29, 2011 at 12:22 PM, Pavan Deolasee < pav...@en...> wrote: > On Thu, Sep 29, 2011 at 8:00 AM, Michael Paquier < > mic...@gm...> wrote: > >> On Thu, Sep 29, 2011 at 11:25 AM, Pavan Deolasee < >> pav...@en...> wrote: >> >>> >>> Could this be because the way we save and restore the GTM info ? I have >>> seen issues because of that, especially if we fail to shutdown everything >>> properly. >>> >> This is indeed possible. Now snapshot data from GTM is saved with malloc >> on Datanodes, and we do not use any *safe* palloc mechanism. >> > > No, you got me wrong. I was talking about the mechanism to save the GTM > state in a file when GTM is shutdown. We then restore from the saved > information at restart. That sometimes cause problem, especially if we have > reinitialized the cluster. But I don't think make installcheck does that, so > may be this is not the issue. > OK, there may be issues related that. But I am also able to reproduce the problem with the 1st regression on a clean cluster from time to time. Michael |
From: Pavan D. <pav...@en...> - 2011-09-29 03:22:45
|
On Thu, Sep 29, 2011 at 8:00 AM, Michael Paquier <mic...@gm...>wrote: > On Thu, Sep 29, 2011 at 11:25 AM, Pavan Deolasee < > pav...@en...> wrote: > >> >> Could this be because the way we save and restore the GTM info ? I have >> seen issues because of that, especially if we fail to shutdown everything >> properly. >> > This is indeed possible. Now snapshot data from GTM is saved with malloc on > Datanodes, and we do not use any *safe* palloc mechanism. > No, you got me wrong. I was talking about the mechanism to save the GTM state in a file when GTM is shutdown. We then restore from the saved information at restart. That sometimes cause problem, especially if we have reinitialized the cluster. But I don't think make installcheck does that, so may be this is not the issue. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com |
From: Pavan D. <pav...@en...> - 2011-09-29 02:33:37
|
Could this be because the way we save and restore the GTM info ? I have seen issues because of that, especially if we fail to shutdown everything properly. Thanks, Pavan On Thu, Sep 29, 2011 at 5:26 AM, Michael Paquier <mic...@gm...>wrote: > Like in bug 3412062, there is a portion of memory that is reacting really > weirdly. > > https://sourceforge.net/tracker/?func=detail&aid=3412062&group_id=311227&atid=1310232 > I suppose that those problems are not directly related but the origin > (memory management) may be the same. > > > On Thu, Sep 29, 2011 at 8:45 AM, Michael Paquier < > mic...@gm...> wrote: > >> I am able to reproduce this issue, but I am not sure to what it is >> related, as it happens randomly. >> As you say, having a tuple concurrently updated would mean a lock or a >> snapshot problem. >> GTM has always worked correctly, so locks? >> >> >> On Wed, Sep 28, 2011 at 8:16 PM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> Here's the assertion that's failing >>> 72 FATAL: tuple concurrently updated >>> 73 TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status != >>> 0x00) || curval == status)", File: "clog.c", Line: 358) >>> 74 LOG: server process (PID 32506) was terminated by signal 6: Aborted >>> 75 LOG: terminating any other active server processes >> >> -- > Michael Paquier > http://michael.otacoo.com > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com |
From: Michael P. <mic...@gm...> - 2011-09-29 02:30:32
|
On Thu, Sep 29, 2011 at 11:25 AM, Pavan Deolasee < pav...@en...> wrote: > > Could this be because the way we save and restore the GTM info ? I have > seen issues because of that, especially if we fail to shutdown everything > properly. > This is indeed possible. Now snapshot data from GTM is saved with malloc on Datanodes, and we do not use any *safe* palloc mechanism. I saw this assertion crash only on remote nodes, both Coordinator and Datanodes, so this may be related to the way data is received on remote node from Coordinator. My question is: why do we use malloc to store snapshot info received on remote node? Is it related to restrictions on sessions? -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-09-28 23:56:30
|
Like in bug 3412062, there is a portion of memory that is reacting really weirdly. https://sourceforge.net/tracker/?func=detail&aid=3412062&group_id=311227&atid=1310232 I suppose that those problems are not directly related but the origin (memory management) may be the same. On Thu, Sep 29, 2011 at 8:45 AM, Michael Paquier <mic...@gm...>wrote: > I am able to reproduce this issue, but I am not sure to what it is related, > as it happens randomly. > As you say, having a tuple concurrently updated would mean a lock or a > snapshot problem. > GTM has always worked correctly, so locks? > > > On Wed, Sep 28, 2011 at 8:16 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Here's the assertion that's failing >> 72 FATAL: tuple concurrently updated >> 73 TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status != >> 0x00) || curval == status)", File: "clog.c", Line: 358) >> 74 LOG: server process (PID 32506) was terminated by signal 6: Aborted >> 75 LOG: terminating any other active server processes > > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-09-28 23:45:28
|
I am able to reproduce this issue, but I am not sure to what it is related, as it happens randomly. As you say, having a tuple concurrently updated would mean a lock or a snapshot problem. GTM has always worked correctly, so locks? On Wed, Sep 28, 2011 at 8:16 PM, Ashutosh Bapat < ash...@en...> wrote: > Here's the assertion that's failing > 72 FATAL: tuple concurrently updated > 73 TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status != > 0x00) || curval == status)", File: "clog.c", Line: 358) > 74 LOG: server process (PID 32506) was terminated by signal 6: Aborted > 75 LOG: terminating any other active server processes -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2011-09-28 11:16:43
|
Here's the assertion that's failing 72 FATAL: tuple concurrently updated 73 TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status != 0x00) || curval == status)", File: "clog.c", Line: 358) 74 LOG: server process (PID 32506) was terminated by signal 6: Aborted 75 LOG: terminating any other active server processes On Wed, Sep 28, 2011 at 4:32 PM, Ashutosh Bapat < ash...@en...> wrote: > > > On Wed, Sep 28, 2011 at 4:12 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> There is something weird going on with regression runs. I was trying to >> understand symptoms for quite some time today. I have at least succeeded in >> finding out what's needed to have regression runs without crash. >> >> If I run regression (make installcheck) the first time, it runs well, >> without any crashes. If I run it again, without shutting down the servers, >> it crashes. The only time I get a run without any crash is when i do >> following steps >> > >> clean make (from root directory) (I think it has to do with the >> installation) >> build the data clusters again >> boot servers >> make installcheck. >> >> > I forgot before you build the dataclusters, you need to remove existing > ones. > > >> The crash is well known crash related to snapshot (I have lost the >> errorlog though). Do we change something installed during make installcheck? >> >> On Mon, Sep 26, 2011 at 8:53 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi all, >>> >>> Please find attached the latest regression results. >>> 34 tests failed in 130 tests. >>> >>> Regards, >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >>> >>> ------------------------------------------------------------------------------ >>> All the data continuously generated in your IT infrastructure contains a >>> definitive record of customers, application performance, security >>> threats, fraudulent activity and more. Splunk takes this data and makes >>> sense of it. Business sense. IT sense. Common sense. >>> http://p.sf.net/sfu/splunk-d2dcopy1 >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2011-09-28 11:02:24
|
On Wed, Sep 28, 2011 at 4:12 PM, Ashutosh Bapat < ash...@en...> wrote: > There is something weird going on with regression runs. I was trying to > understand symptoms for quite some time today. I have at least succeeded in > finding out what's needed to have regression runs without crash. > > If I run regression (make installcheck) the first time, it runs well, > without any crashes. If I run it again, without shutting down the servers, > it crashes. The only time I get a run without any crash is when i do > following steps > > clean make (from root directory) (I think it has to do with the > installation) > build the data clusters again > boot servers > make installcheck. > > I forgot before you build the dataclusters, you need to remove existing ones. > The crash is well known crash related to snapshot (I have lost the errorlog > though). Do we change something installed during make installcheck? > > On Mon, Sep 26, 2011 at 8:53 AM, Michael Paquier < > mic...@gm...> wrote: > >> Hi all, >> >> Please find attached the latest regression results. >> 34 tests failed in 130 tests. >> >> Regards, >> -- >> Michael Paquier >> http://michael.otacoo.com >> >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2dcopy1 >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2011-09-28 10:43:00
|
There is something weird going on with regression runs. I was trying to understand symptoms for quite some time today. I have at least succeeded in finding out what's needed to have regression runs without crash. If I run regression (make installcheck) the first time, it runs well, without any crashes. If I run it again, without shutting down the servers, it crashes. The only time I get a run without any crash is when i do following steps clean make (from root directory) (I think it has to do with the installation) build the data clusters again boot servers make installcheck. The crash is well known crash related to snapshot (I have lost the errorlog though). Do we change something installed during make installcheck? On Mon, Sep 26, 2011 at 8:53 AM, Michael Paquier <mic...@gm...>wrote: > Hi all, > > Please find attached the latest regression results. > 34 tests failed in 130 tests. > > Regards, > -- > Michael Paquier > http://michael.otacoo.com > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2011-09-16 02:05:55
|
On Fri, Sep 16, 2011 at 5:28 AM, Koichi Suzuki <koi...@gm...> wrote: > I found the two feature was changed in the current master. > > 1) Now datanodes need pooler port number. > Pooler has always been a Coordinator process. This has ever been needed by Datanodes. 2) Options for pg_ctl and postgres changed. > pg_ctl: -S option is now -Z option. > This is true that it has been changed after 9.1 merge, attached patch corrects that. > postgres: now -S option is replaced with 9.0 and later. To > control coordinator/datanode to start, we need to use -C or -X option. > For postgres, the options have always been the same: - X for datanode - C for Coordinator I checked the docs (postgres-ref.sgmlin) and it is correct. -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-09-15 20:40:45
|
On Fri, Sep 16, 2011 at 5:23 AM, Koichi Suzuki <koi...@gm...> wrote: > I see the problem. It may add some overhead to create temp table to > all the other coordinator but I hope this does not a problem. > This is not a problem I think, even if there will be additional connections between nodes through the pooler. -- Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2011-09-15 20:28:50
|
I found the two feature was changed in the current master. 1) Now datanodes need pooler port number. 2) Options for pg_ctl and postgres changed. pg_xtl: -S option is now -Z option. postgres: now -S option is replaced with 9.0 and later. To control coordinator/datanode to start, we need to use -C or -X option. We need to change the documentation as well. Regards; ---------- Koichi Suzuki |
From: Koichi S. <koi...@gm...> - 2011-09-15 20:23:56
|
I see the problem. It may add some overhead to create temp table to all the other coordinator but I hope this does not a problem. ---------- Koichi Suzuki 2011/9/15 Michael Paquier <mic...@gm...>: > Hi all, > > While playing with temporary tables, I found an issue when trying to use a > LIKE on a temporary table to create a non-temporary table. > template1=# create temp table aa (a int); > CREATE TABLE > template1=# create table bb (like aa); > ERROR: relation "aa" does not exist > > The origin of this problem is that a temporary table is only created on > local coordinator and on all the datanodes. > This could be solved by enforcing temp table creation on all the nodes. > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Doing More with Less: The Next Generation Virtual Desktop > What are the key obstacles that have prevented many mid-market businesses > from deploying virtual desktops? How do next-generation virtual desktops > provide companies an easier-to-deploy, easier-to-manage and more affordable > virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Michael P. <mic...@gm...> - 2011-09-15 19:31:47
|
Hi all, While playing with temporary tables, I found an issue when trying to use a LIKE on a temporary table to create a non-temporary table. template1=# create temp table aa (a int); CREATE TABLE template1=# create table bb (like aa); ERROR: relation "aa" does not exist The origin of this problem is that a temporary table is only created on local coordinator and on all the datanodes. This could be solved by enforcing temp table creation on all the nodes. -- Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2011-09-05 01:03:23
|
I'm afraid this was caused by Listen/Notify/Unlisten which we don't support yet. ---------- Koichi Suzuki 2011/9/5 Michael Paquier <mic...@gm...>: > Hi all, > > When running make check, it is possible that XC cluster freezes, waiting for > a lock hold by another transaction. Presumably a 2PC lock. > 2PC is mandatory for write transactions involving more than 2 nodes, and > when a regression issues a COMMIT, it is possible that this becomes a 2PC if > a DDL is launched. > > I suggest we should identify the test cases that may conflict with 2PC locks > and do not parallelize them. > What do you think? > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Special Offer -- Download ArcSight Logger for FREE! > Finally, a world-class log management solution at an even better > price-free! And you'll get a free "Love Thy Logs" t-shirt when you > download Logger. Secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsisghtdev2dev > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Michael P. <mic...@gm...> - 2011-09-05 00:47:00
|
Hi all, When running make check, it is possible that XC cluster freezes, waiting for a lock hold by another transaction. Presumably a 2PC lock. 2PC is mandatory for write transactions involving more than 2 nodes, and when a regression issues a COMMIT, it is possible that this becomes a 2PC if a DDL is launched. I suggest we should identify the test cases that may conflict with 2PC locks and do not parallelize them. What do you think? -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2011-09-02 08:43:28
|
Hmm Ok. Please commit the patch. I will be nice to write down somewhere, as to how do we choose different ports, directories, for various coordinators and datanodes. On Fri, Sep 2, 2011 at 1:31 PM, Michael Paquier <mic...@gm...>wrote: > > > On Fri, Sep 2, 2011 at 3:32 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Hi Michael, >> Sorry for delay. >> Here are my comments, >> 1. This patch adds xc_groupby xc_distkey xc_having xc_temp in parallel >> schedule. Does that mean that these tests will be run simultaneously? If so, >> we have to make sure that they create tables/objects with different names >> such that those do not conflict with each other. Are we going to use second >> coordinator for firing the parallel testcase? >> > Those test cases run OK in parallel. > > >> 2. It may be better to separate the XC specific code into a separate C >> file pgxc_regress.c or something and call it in pg_regress.c. That way >> pg_regress will remain clean. > > I am not sure it is the way to do. > In order to keep a code compact I made a lot of functions that use static > variables of pg_regress.c as the same operations are repeated several times. > If I put the additional functions in another file I will have to export > those variables or add additional arguments when calling for external APIs. > This may heavy the code more than light it up. > -- > Michael Paquier > http://michael.otacoo.com > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2011-09-02 08:02:06
|
On Fri, Sep 2, 2011 at 3:32 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Michael, > Sorry for delay. > Here are my comments, > 1. This patch adds xc_groupby xc_distkey xc_having xc_temp in parallel > schedule. Does that mean that these tests will be run simultaneously? If so, > we have to make sure that they create tables/objects with different names > such that those do not conflict with each other. Are we going to use second > coordinator for firing the parallel testcase? > Those test cases run OK in parallel. > 2. It may be better to separate the XC specific code into a separate C file > pgxc_regress.c or something and call it in pg_regress.c. That way pg_regress > will remain clean. I am not sure it is the way to do. In order to keep a code compact I made a lot of functions that use static variables of pg_regress.c as the same operations are repeated several times. If I put the additional functions in another file I will have to export those variables or add additional arguments when calling for external APIs. This may heavy the code more than light it up. -- Michael Paquier http://michael.otacoo.com |