postgres-xc-developers Mailing List for Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-developers — Postgres-XC hackers and developers

You can subscribe to this list here.

2010	Jan	Feb	Mar	Apr (10)	May (17)	Jun (3)	Jul	Aug	Sep (8)	Oct (18)	Nov (51)	Dec (74)
2011	Jan (47)	Feb (44)	Mar (44)	Apr (102)	May (35)	Jun (25)	Jul (56)	Aug (69)	Sep (32)	Oct (37)	Nov (31)	Dec (16)
2012	Jan (34)	Feb (127)	Mar (218)	Apr (252)	May (80)	Jun (137)	Jul (205)	Aug (159)	Sep (35)	Oct (50)	Nov (82)	Dec (52)
2013	Jan (107)	Feb (159)	Mar (118)	Apr (163)	May (151)	Jun (89)	Jul (106)	Aug (177)	Sep (49)	Oct (63)	Nov (46)	Dec (7)
2014	Jan (65)	Feb (128)	Mar (40)	Apr (11)	May (4)	Jun (8)	Jul (16)	Aug (11)	Sep (4)	Oct (1)	Nov (5)	Dec (16)
2015	Jan (5)	Feb	Mar (2)	Apr (5)	May (4)	Jun (12)	Jul	Aug	Sep	Oct	Nov	Dec (4)
2019	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

1 2 > >> (Page 1 of 2)

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Koichi S. <koi...@gm...> - 2011-09-30 04:50:30

Are there any statement to start new session holding the connection?  Writing in a bus from the airport and don't have reference though.

---
Koichi Suzuki

On 2011/09/30, at 13:08, Michael Paquier <mic...@gm...> wrote:

> A new idea to solve this issue came to my mind;
> Destroy the connection slot on pooler if temporary objects are on it. This will clean up the backends correctly I think.
> This is perhaps the easier way to do, it is clean but may impact performance for applications using a lot of temporary objects as each session will close the connections to other datanodes to clean everything.
> 
> On Fri, Sep 30, 2011 at 11:54 AM, Michael Paquier <mic...@gm...> wrote:
> I think I found the origin of the problem.
> When ending a session, a DISCARD query is automatically run from pooler to clean up connections before put them back to pool.
> However, this query needs a transaction ID to commit normally in autocommit. But it cannot obtain it because pooler does not send down a transaction ID at session ending.
> LOG:  statement: DISCARD ALL;
> DEBUG:  Local snapshot is built, xmin: 10003, xmax: 10003, xcnt: 0, RecentGlobalXmin: 10003
> STATEMENT:  DISCARD ALL;
> LOG:  Falling back to local Xid. Was = 0, now is = 10003
> STATEMENT:  DISCARD ALL;
> DEBUG:  Record transaction commit 10003
> 
> I am thinking about the following solution:
> Adding a new session parameter that can force backends of a session to get GXID from GTM to ensure that commit ID is unique in the cluster.
> Attached patch implemented that but it does not look to work yet.
> 
> Any thoughts?
> 
> -- 
> Michael Paquier
> http://michael.otacoo.com
> 
> 
> 
> -- 
> Michael Paquier
> http://michael.otacoo.com
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Michael P. <mic...@gm...> - 2011-09-30 04:08:24

A new idea to solve this issue came to my mind;
Destroy the connection slot on pooler if temporary objects are on it. This
will clean up the backends correctly I think.
This is perhaps the easier way to do, it is clean but may impact performance
for applications using a lot of temporary objects as each session will close
the connections to other datanodes to clean everything.

On Fri, Sep 30, 2011 at 11:54 AM, Michael Paquier <mic...@gm...
> wrote:

> I think I found the origin of the problem.
> When ending a session, a DISCARD query is automatically run from pooler to
> clean up connections before put them back to pool.
> However, this query needs a transaction ID to commit normally in
> autocommit. But it cannot obtain it because pooler does not send down a
> transaction ID at session ending.
> LOG:  statement: DISCARD ALL;
> DEBUG:  Local snapshot is built, xmin: 10003, xmax: 10003, xcnt: 0,
> RecentGlobalXmin: 10003
> STATEMENT:  DISCARD ALL;
> LOG:  Falling back to local Xid. Was = 0, now is = 10003
> STATEMENT:  DISCARD ALL;
> DEBUG:  Record transaction commit 10003
>
> I am thinking about the following solution:
> Adding a new session parameter that can force backends of a session to get
> GXID from GTM to ensure that commit ID is unique in the cluster.
> Attached patch implemented that but it does not look to work yet.
>
> Any thoughts?
>
> --
> Michael Paquier
> http://michael.otacoo.com
>



-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Michael P. <mic...@gm...> - 2011-09-30 02:54:55

Attachments: 20110930_discard.patch

I think I found the origin of the problem.
When ending a session, a DISCARD query is automatically run from pooler to
clean up connections before put them back to pool.
However, this query needs a transaction ID to commit normally in autocommit.
But it cannot obtain it because pooler does not send down a transaction ID
at session ending.
LOG:  statement: DISCARD ALL;
DEBUG:  Local snapshot is built, xmin: 10003, xmax: 10003, xcnt: 0,
RecentGlobalXmin: 10003
STATEMENT:  DISCARD ALL;
LOG:  Falling back to local Xid. Was = 0, now is = 10003
STATEMENT:  DISCARD ALL;
DEBUG:  Record transaction commit 10003

I am thinking about the following solution:
Adding a new session parameter that can force backends of a session to get
GXID from GTM to ensure that commit ID is unique in the cluster.
Attached patch implemented that but it does not look to work yet.

Any thoughts?
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Michael P. <mic...@gm...> - 2011-09-30 01:11:24

I have been able to isolate that the error could happen in multiple places,
but its origin looks to be commit prepared.
After a couple of tests, I found that commit prepared fails with a clog
issue:
#2  0x00000000008516da in ExceptionalCondition (conditionName=0x8c4ff8
"!(curval == 0 || (curval == 0x03 && status != 0x00) || curval == status)",
errorType=0x8c4ec7 "FailedAssertion",
    fileName=0x8c4ec0 "clog.c", lineNumber=358) at assert.c:57
#3  0x00000000004b4be9 in TransactionIdSetStatusBit (xid=20844, status=1,
lsn=..., slotno=0) at clog.c:355
#4  0x00000000004b4a64 in TransactionIdSetPageStatus (xid=20844, nsubxids=0,
subxids=0x2ca1440, status=1, lsn=..., pageno=0) at clog.c:309
#5  0x00000000004b47c3 in TransactionIdSetTreeStatus (xid=20844, nsubxids=0,
subxids=0x2ca1440, status=1, lsn=...) at clog.c:182
#6  0x00000000004b563d in TransactionIdCommitTree (xid=20844, nxids=0,
xids=0x2ca1440) at transam.c:266
#7  0x00000000004d9e4c in RecordTransactionCommitPrepared (xid=20844,
nchildren=0, children=0x2ca1440, nrels=0, rels=0x2ca1440, ninvalmsgs=2,
invalmsgs=0x2ca1440, initfileinval=0 '\000') at twophase.c:2043
#8  0x00000000004d8713 in FinishPreparedTransaction (gid=0x2d5fa50 "T20844",
isCommit=1 '\001') at twophase.c:1308
#9  0x00000000007555ab in standard_ProcessUtility (parsetree=0x2d5fa70,
queryString=0x2d5f058 "COMMIT PREPARED 'T20844'", params=0x0, isTopLevel=1
'\001', dest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "")
    at utility.c:530
#10 0x00000000007550ee in ProcessUtility (parsetree=0x2d5fa70,
queryString=0x2d5f058 "COMMIT PREPARED 'T20844'", params=0x0, isTopLevel=1
'\001', dest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "")
    at utility.c:354
#11 0x0000000000753f30 in PortalRunUtility (portal=0x2ca3c48,
utilityStmt=0x2d5fa70, isTopLevel=1 '\001', dest=0x2d5fdf8,
completionTag=0x7fff41fe56e0 "") at pquery.c:1218
#12 0x00000000007541c1 in PortalRunMulti (portal=0x2ca3c48, isTopLevel=1
'\001', dest=0x2d5fdf8, altdest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "")
at pquery.c:1362
#13 0x0000000000753641 in PortalRun (portal=0x2ca3c48,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x2d5fdf8,
altdest=0x2d5fdf8, completionTag=0x7fff41fe56e0 "") at pquery.c:843
#14 0x000000000074d017 in exec_simple_query (query_string=0x2d5f058 "COMMIT
PREPARED 'T20844'") at postgres.c:1088
#15 0x00000000007514c2 in PostgresMain (argc=2, argv=0x2c85c80,
username=0x2c85c00 "michael") at postgres.c:4105
#16 0x00000000006f791b in BackendRun (port=0x2cb4f50) at postmaster.c:3786
#17 0x00000000006f6f79 in BackendStartup (port=0x2cb4f50) at
postmaster.c:3466
#18 0x00000000006f3e0a in ServerLoop () at postmaster.c:1530
#19 0x00000000006f35ab in PostmasterMain (argc=7, argv=0x2c82b60) at
postmaster.c:1191
#20 0x000000000065efa9 in main (argc=7, argv=0x2c82b60) at main.c:199
The real issue looks to be here the commit tree that acts weirdly at commit
prepared.

After the 1st crash, there is an additional behavior. Datanode servers
usually restart but enter in this inconsistent state and servers stop
abruptly at recovery:
#0  0x00007f728abd9a75 in raise () from /lib/libc.so.6
#1  0x00007f728abdd5c0 in abort () from /lib/libc.so.6
#2  0x00000000008516da in ExceptionalCondition (conditionName=0x8c4ff8
"!(curval == 0 || (curval == 0x03 && status != 0x00) || curval == status)",
errorType=0x8c4ec7 "FailedAssertion",
    fileName=0x8c4ec0 "clog.c", lineNumber=358) at assert.c:57
#3  0x00000000004b4be9 in TransactionIdSetStatusBit (xid=20844, status=1,
lsn=..., slotno=0) at clog.c:355
#4  0x00000000004b4a64 in TransactionIdSetPageStatus (xid=20844, nsubxids=0,
subxids=0x2cc4ef8, status=1, lsn=..., pageno=0) at clog.c:309
#5  0x00000000004b47c3 in TransactionIdSetTreeStatus (xid=20844, nsubxids=0,
subxids=0x2cc4ef8, status=1, lsn=...) at clog.c:182
#6  0x00000000004b563d in TransactionIdCommitTree (xid=20844, nxids=0,
xids=0x2cc4ef8) at transam.c:266
#7  0x00000000004bbb8f in xact_redo_commit (xlrec=0x2cc4ed8, xid=20844,
lsn=...) at xact.c:5074
#8  0x00000000004bc038 in xact_redo (lsn=..., record=0x2cc4eb0) at
xact.c:5275
#9  0x00000000004c9e72 in StartupXLOG () at xlog.c:6665
#10 0x00000000004d02ff in StartupProcessMain () at xlog.c:10069
#11 0x00000000004f87c3 in AuxiliaryProcessMain (argc=2, argv=0x7fff51668760)
at bootstrap.c:434
#12 0x00000000006f7f7a in StartChildProcess (type=StartupProcess) at
postmaster.c:4684
#13 0x00000000006f6b39 in PostmasterStateMachine () at postmaster.c:3275
#14 0x00000000006f5c7a in reaper (postgres_signal_arg=17) at
postmaster.c:2726
#15 <signal handler called>
#16 0x00007f728ac84fd3 in select () from /lib/libc.so.6
#17 0x00000000006f3cd9 in ServerLoop () at postmaster.c:1490
#18 0x00000000006f35ab in PostmasterMain (argc=7, argv=0x2c90b60) at
postmaster.c:1191
#19 0x000000000065efa9 in main (argc=7, argv=0x2c90b60) at main.c:199

This analysis is in progress but I have an idea of the origin.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Michael P. <mic...@gm...> - 2011-09-29 23:29:09

On Thu, Sep 29, 2011 at 2:31 PM, Ashutosh Bapat <
ash...@en...> wrote:

> If we kind of know the area where the problems are, it will help to fix the
> bug, so that regressions are crash free. I will need to depend upon the
> regression a lot for the cleanup. Is it possible to fix the problem soon?

To be honest I am not sure. I would first need to find the origin of the
problem and I am not really sure it is that easy.
Let me have a shot on it though.

-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Ashutosh B. <ash...@en...> - 2011-09-29 05:31:32

If we kind of know the area where the problems are, it will help to fix the
bug, so that regressions are crash free. I will need to depend upon the
regression a lot for the cleanup. Is it possible to fix the problem soon?

On Thu, Sep 29, 2011 at 9:00 AM, Michael Paquier
<mic...@gm...>wrote:

> On Thu, Sep 29, 2011 at 12:22 PM, Pavan Deolasee <
> pav...@en...> wrote:
>
>> On Thu, Sep 29, 2011 at 8:00 AM, Michael Paquier <
>> mic...@gm...> wrote:
>>
>>> On Thu, Sep 29, 2011 at 11:25 AM, Pavan Deolasee <
>>> pav...@en...> wrote:
>>>
>>>>
>>>> Could this be because the way we save and restore the GTM info ? I have
>>>> seen issues because of that, especially if we fail to shutdown everything
>>>> properly.
>>>>
>>> This is indeed possible. Now snapshot data from GTM is saved with malloc
>>> on Datanodes, and we do not use any *safe* palloc mechanism.
>>>
>>
>> No, you got me wrong. I was talking about the mechanism to save the GTM
>> state in a file when GTM is shutdown. We then restore from the saved
>> information at restart. That sometimes cause problem, especially if we have
>> reinitialized the cluster. But I don't think make installcheck does that, so
>> may be this is not the issue.
>>
> OK, there may be issues related that. But I am also able to reproduce the
> problem with the 1st regression on a clean cluster from time to time.
>
> Michael
>



-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

[Postgres-xc-developers] 0.9.5.1 subrelease

From: Michael P. <mic...@gm...> - 2011-09-29 04:32:07

Hi all,

I am preparing a sub-release based on branch 0.9.5 stable.
Compared to 0.9.5, this release contains some fix regarding performance, and
it includes all the commits done in postgresql 9.0 stable up to now.
Regressions and performance are not impacted at all, so I will commit that
in 0.9.5 stable branch if there are no objections.

Regards,
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Michael P. <mic...@gm...> - 2011-09-29 03:31:07

On Thu, Sep 29, 2011 at 12:22 PM, Pavan Deolasee <
pav...@en...> wrote:

> On Thu, Sep 29, 2011 at 8:00 AM, Michael Paquier <
> mic...@gm...> wrote:
>
>> On Thu, Sep 29, 2011 at 11:25 AM, Pavan Deolasee <
>> pav...@en...> wrote:
>>
>>>
>>> Could this be because the way we save and restore the GTM info ? I have
>>> seen issues because of that, especially if we fail to shutdown everything
>>> properly.
>>>
>> This is indeed possible. Now snapshot data from GTM is saved with malloc
>> on Datanodes, and we do not use any *safe* palloc mechanism.
>>
>
> No, you got me wrong. I was talking about the mechanism to save the GTM
> state in a file when GTM is shutdown. We then restore from the saved
> information at restart. That sometimes cause problem, especially if we have
> reinitialized the cluster. But I don't think make installcheck does that, so
> may be this is not the issue.
>
OK, there may be issues related that. But I am also able to reproduce the
problem with the 1st regression on a clean cluster from time to time.

Michael

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Pavan D. <pav...@en...> - 2011-09-29 03:22:45

On Thu, Sep 29, 2011 at 8:00 AM, Michael Paquier
<mic...@gm...>wrote:

> On Thu, Sep 29, 2011 at 11:25 AM, Pavan Deolasee <
> pav...@en...> wrote:
>
>>
>> Could this be because the way we save and restore the GTM info ? I have
>> seen issues because of that, especially if we fail to shutdown everything
>> properly.
>>
> This is indeed possible. Now snapshot data from GTM is saved with malloc on
> Datanodes, and we do not use any *safe* palloc mechanism.
>

No, you got me wrong. I was talking about the mechanism to save the GTM
state in a file when GTM is shutdown. We then restore from the saved
information at restart. That sometimes cause problem, especially if we have
reinitialized the cluster. But I don't think make installcheck does that, so
may be this is not the issue.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Pavan D. <pav...@en...> - 2011-09-29 02:33:37

Could this be because the way we save and restore the GTM info ? I have seen
issues because of that, especially if we fail to shutdown everything
properly.

Thanks,
Pavan

On Thu, Sep 29, 2011 at 5:26 AM, Michael Paquier
<mic...@gm...>wrote:

> Like in bug 3412062, there is a portion of memory that is reacting really
> weirdly.
>
> https://sourceforge.net/tracker/?func=detail&aid=3412062&group_id=311227&atid=1310232
> I suppose that those problems are not directly related but the origin
> (memory management) may be the same.
>
>
> On Thu, Sep 29, 2011 at 8:45 AM, Michael Paquier <
> mic...@gm...> wrote:
>
>> I am able to reproduce this issue, but I am not sure to what it is
>> related, as it happens randomly.
>> As you say, having a tuple concurrently updated would mean a lock or a
>> snapshot problem.
>> GTM has always worked correctly, so locks?
>>
>>
>> On Wed, Sep 28, 2011 at 8:16 PM, Ashutosh Bapat <
>> ash...@en...> wrote:
>>
>>> Here's the assertion that's failing
>>>  72 FATAL:  tuple concurrently updated
>>>  73 TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status !=
>>> 0x00) || curval == status)", File: "clog.c", Line: 358)
>>>  74 LOG:  server process (PID 32506) was terminated by signal 6: Aborted
>>>  75 LOG:  terminating any other active server processes
>>
>> --
> Michael Paquier
> http://michael.otacoo.com
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Michael P. <mic...@gm...> - 2011-09-29 02:30:32

On Thu, Sep 29, 2011 at 11:25 AM, Pavan Deolasee <
pav...@en...> wrote:

>
> Could this be because the way we save and restore the GTM info ? I have
> seen issues because of that, especially if we fail to shutdown everything
> properly.
>
This is indeed possible. Now snapshot data from GTM is saved with malloc on
Datanodes, and we do not use any *safe* palloc mechanism.
I saw this assertion crash only on remote nodes, both Coordinator and
Datanodes, so this may be related to the way data is received on remote node
from Coordinator.
My question is: why do we use malloc to store snapshot info received on
remote node? Is it related to restrictions on sessions?
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Michael P. <mic...@gm...> - 2011-09-28 23:56:30

Like in bug 3412062, there is a portion of memory that is reacting really
weirdly.
https://sourceforge.net/tracker/?func=detail&aid=3412062&group_id=311227&atid=1310232
I suppose that those problems are not directly related but the origin
(memory management) may be the same.

On Thu, Sep 29, 2011 at 8:45 AM, Michael Paquier
<mic...@gm...>wrote:

> I am able to reproduce this issue, but I am not sure to what it is related,
> as it happens randomly.
> As you say, having a tuple concurrently updated would mean a lock or a
> snapshot problem.
> GTM has always worked correctly, so locks?
>
>
> On Wed, Sep 28, 2011 at 8:16 PM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>> Here's the assertion that's failing
>>  72 FATAL:  tuple concurrently updated
>>  73 TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status !=
>> 0x00) || curval == status)", File: "clog.c", Line: 358)
>>  74 LOG:  server process (PID 32506) was terminated by signal 6: Aborted
>>  75 LOG:  terminating any other active server processes
>
> --
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Michael P. <mic...@gm...> - 2011-09-28 23:45:28

I am able to reproduce this issue, but I am not sure to what it is related,
as it happens randomly.
As you say, having a tuple concurrently updated would mean a lock or a
snapshot problem.
GTM has always worked correctly, so locks?

On Wed, Sep 28, 2011 at 8:16 PM, Ashutosh Bapat <
ash...@en...> wrote:

> Here's the assertion that's failing
>  72 FATAL:  tuple concurrently updated
>  73 TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status !=
> 0x00) || curval == status)", File: "clog.c", Line: 358)
>  74 LOG:  server process (PID 32506) was terminated by signal 6: Aborted
>  75 LOG:  terminating any other active server processes

-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Ashutosh B. <ash...@en...> - 2011-09-28 11:16:43

Here's the assertion that's failing
 72 FATAL:  tuple concurrently updated
 73 TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status !=
0x00) || curval == status)", File: "clog.c", Line: 358)
 74 LOG:  server process (PID 32506) was terminated by signal 6: Aborted
 75 LOG:  terminating any other active server processes


On Wed, Sep 28, 2011 at 4:32 PM, Ashutosh Bapat <
ash...@en...> wrote:

>
>
> On Wed, Sep 28, 2011 at 4:12 PM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>> There is something weird going on with regression runs. I was trying to
>> understand symptoms for quite some time today. I have at least succeeded in
>> finding out what's needed to have regression runs without crash.
>>
>> If I run regression (make installcheck) the first time, it runs well,
>> without any crashes. If I run it again, without shutting down the servers,
>> it crashes. The only time I get a run without any crash is when i do
>> following steps
>>
>
>> clean make (from root directory) (I think it has to do with the
>> installation)
>> build the data clusters again
>> boot servers
>> make installcheck.
>>
>>
> I forgot before you build the dataclusters, you need to remove existing
> ones.
>
>
>> The crash is well known crash related to snapshot (I have lost the
>> errorlog though). Do we change something installed during make installcheck?
>>
>> On Mon, Sep 26, 2011 at 8:53 AM, Michael Paquier <
>> mic...@gm...> wrote:
>>
>>> Hi all,
>>>
>>> Please find attached the latest regression results.
>>> 34 tests failed in 130 tests.
>>>
>>> Regards,
>>> --
>>> Michael Paquier
>>> http://michael.otacoo.com
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2dcopy1
>>> _______________________________________________
>>> Postgres-xc-developers mailing list
>>> Pos...@li...
>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Enterprise Postgres Company
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>
>


-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Ashutosh B. <ash...@en...> - 2011-09-28 11:02:24

On Wed, Sep 28, 2011 at 4:12 PM, Ashutosh Bapat <
ash...@en...> wrote:

> There is something weird going on with regression runs. I was trying to
> understand symptoms for quite some time today. I have at least succeeded in
> finding out what's needed to have regression runs without crash.
>
> If I run regression (make installcheck) the first time, it runs well,
> without any crashes. If I run it again, without shutting down the servers,
> it crashes. The only time I get a run without any crash is when i do
> following steps
>

> clean make (from root directory) (I think it has to do with the
> installation)
> build the data clusters again
> boot servers
> make installcheck.
>
>
I forgot before you build the dataclusters, you need to remove existing
ones.


> The crash is well known crash related to snapshot (I have lost the errorlog
> though). Do we change something installed during make installcheck?
>
> On Mon, Sep 26, 2011 at 8:53 AM, Michael Paquier <
> mic...@gm...> wrote:
>
>> Hi all,
>>
>> Please find attached the latest regression results.
>> 34 tests failed in 130 tests.
>>
>> Regards,
>> --
>> Michael Paquier
>> http://michael.otacoo.com
>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2dcopy1
>> _______________________________________________
>> Postgres-xc-developers mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>
>


-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-developers] Latest Regression Status 2011/09/26

From: Ashutosh B. <ash...@en...> - 2011-09-28 10:43:00

There is something weird going on with regression runs. I was trying to
understand symptoms for quite some time today. I have at least succeeded in
finding out what's needed to have regression runs without crash.

If I run regression (make installcheck) the first time, it runs well,
without any crashes. If I run it again, without shutting down the servers,
it crashes. The only time I get a run without any crash is when i do
following steps

clean make (from root directory) (I think it has to do with the
installation)
build the data clusters again
boot servers
make installcheck.

The crash is well known crash related to snapshot (I have lost the errorlog
though). Do we change something installed during make installcheck?

On Mon, Sep 26, 2011 at 8:53 AM, Michael Paquier
<mic...@gm...>wrote:

> Hi all,
>
> Please find attached the latest regression results.
> 34 tests failed in 130 tests.
>
> Regards,
> --
> Michael Paquier
> http://michael.otacoo.com
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-developers] Change of documentation due to 9.1 merge and others

From: Michael P. <mic...@gm...> - 2011-09-16 02:05:55

Attachments: 20110915_doc.patch

On Fri, Sep 16, 2011 at 5:28 AM, Koichi Suzuki <koi...@gm...> wrote:

> I found the two feature was changed in the current master.
>
> 1) Now datanodes need pooler port number.
>
Pooler has always been a Coordinator process.
This has ever been needed by Datanodes.

2) Options for pg_ctl and postgres changed.
>    pg_ctl: -S option is now -Z option.
>
This is true that it has been changed after 9.1 merge, attached patch
corrects that.

>    postgres: now -S option is replaced with 9.0 and later.   To
> control coordinator/datanode to start, we need to use -C or -X option.
>
For postgres, the options have always been the same:
- X for datanode
- C for Coordinator
I checked the docs (postgres-ref.sgmlin) and it is correct.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Small issue with temporary tables

From: Michael P. <mic...@gm...> - 2011-09-15 20:40:45

On Fri, Sep 16, 2011 at 5:23 AM, Koichi Suzuki <koi...@gm...> wrote:

> I see the problem.   It may add some overhead to create temp table to
> all the other coordinator but I hope this does not a problem.
>
This is not a problem I think, even if there will be additional connections
between nodes through the pooler.
-- 
Michael Paquier
http://michael.otacoo.com

[Postgres-xc-developers] Change of documentation due to 9.1 merge and others

From: Koichi S. <koi...@gm...> - 2011-09-15 20:28:50

I found the two feature was changed in the current master.

1) Now datanodes need pooler port number.
2) Options for pg_ctl and postgres changed.
    pg_xtl: -S option is now -Z option.
    postgres: now -S option is replaced with 9.0 and later.   To
control coordinator/datanode to start, we need to use -C or -X option.

We need to change the documentation as well.

Regards;
----------
Koichi Suzuki

Re: [Postgres-xc-developers] Small issue with temporary tables

From: Koichi S. <koi...@gm...> - 2011-09-15 20:23:56

I see the problem.   It may add some overhead to create temp table to
all the other coordinator but I hope this does not a problem.
----------
Koichi Suzuki



2011/9/15 Michael Paquier <mic...@gm...>:
> Hi all,
>
> While playing with temporary tables, I found an issue when trying to use a
> LIKE on a temporary table to create a non-temporary table.
> template1=# create temp table aa (a int);
> CREATE TABLE
> template1=# create table bb (like aa);
> ERROR: relation "aa" does not exist
>
> The origin of this problem is that a temporary table is only created on
> local coordinator and on all the datanodes.
> This could be solved by enforcing temp table creation on all the nodes.
> --
> Michael Paquier
> http://michael.otacoo.com
>
> ------------------------------------------------------------------------------
> Doing More with Less: The Next Generation Virtual Desktop
> What are the key obstacles that have prevented many mid-market businesses
> from deploying virtual desktops?   How do next-generation virtual desktops
> provide companies an easier-to-deploy, easier-to-manage and more affordable
> virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>

[Postgres-xc-developers] Small issue with temporary tables

From: Michael P. <mic...@gm...> - 2011-09-15 19:31:47

Hi all,

While playing with temporary tables, I found an issue when trying to use a
LIKE on a temporary table to create a non-temporary table.
template1=# create temp table aa (a int);
CREATE TABLE
template1=# create table bb (like aa);
ERROR: relation "aa" does not exist

The origin of this problem is that a temporary table is only created on
local coordinator and on all the datanodes.
This could be solved by enforcing temp table creation on all the nodes.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] regressions and parallel scheduling

From: Koichi S. <koi...@gm...> - 2011-09-05 01:03:23

I'm afraid this was caused by Listen/Notify/Unlisten which we don't support yet.
----------
Koichi Suzuki



2011/9/5 Michael Paquier <mic...@gm...>:
> Hi all,
>
> When running make check, it is possible that XC cluster freezes, waiting for
> a lock hold by another transaction. Presumably a 2PC lock.
> 2PC is mandatory for write transactions involving more than 2 nodes, and
> when a regression issues a COMMIT, it is possible that this becomes a 2PC if
> a DDL is launched.
>
> I suggest we should identify the test cases that may conflict with 2PC locks
> and do not parallelize them.
> What do you think?
> --
> Michael Paquier
> http://michael.otacoo.com
>
> ------------------------------------------------------------------------------
> Special Offer -- Download ArcSight Logger for FREE!
> Finally, a world-class log management solution at an even better
> price-free! And you'll get a free "Love Thy Logs" t-shirt when you
> download Logger. Secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsisghtdev2dev
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>

[Postgres-xc-developers] regressions and parallel scheduling

From: Michael P. <mic...@gm...> - 2011-09-05 00:47:00

Hi all,

When running make check, it is possible that XC cluster freezes, waiting for
a lock hold by another transaction. Presumably a 2PC lock.
2PC is mandatory for write transactions involving more than 2 nodes, and
when a regression issues a COMMIT, it is possible that this becomes a 2PC if
a DDL is launched.

I suggest we should identify the test cases that may conflict with 2PC locks
and do not parallelize them.
What do you think?
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Patch for "make check" in pg_regress

From: Ashutosh B. <ash...@en...> - 2011-09-02 08:43:28

Hmm Ok. Please commit the patch. I will be nice to write down somewhere, as
to how do we choose different ports, directories, for various coordinators
and datanodes.

On Fri, Sep 2, 2011 at 1:31 PM, Michael Paquier
<mic...@gm...>wrote:

>
>
> On Fri, Sep 2, 2011 at 3:32 PM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>> Hi Michael,
>> Sorry for delay.
>> Here are my comments,
>> 1. This patch adds  xc_groupby xc_distkey xc_having xc_temp in parallel
>> schedule. Does that mean that these tests will be run simultaneously? If so,
>> we have to make sure that they create tables/objects with different names
>> such that those do not conflict with each other. Are we going to use second
>> coordinator for firing the parallel testcase?
>>
> Those test cases run OK in parallel.
>
>
>> 2. It may be better to separate the XC specific code into a separate C
>> file pgxc_regress.c or something and call it in pg_regress.c. That way
>> pg_regress will remain clean.
>
> I am not sure it is the way to do.
> In order to keep a code compact I made a lot of functions that use static
> variables of pg_regress.c as the same operations are repeated several times.
> If I put the additional functions in another file I will have to export
> those variables or add additional arguments when calling for external APIs.
> This may heavy the code more than light it up.
> --
> Michael Paquier
> http://michael.otacoo.com
>



-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-developers] Patch for "make check" in pg_regress

From: Michael P. <mic...@gm...> - 2011-09-02 08:02:06

On Fri, Sep 2, 2011 at 3:32 PM, Ashutosh Bapat <
ash...@en...> wrote:

> Hi Michael,
> Sorry for delay.
> Here are my comments,
> 1. This patch adds  xc_groupby xc_distkey xc_having xc_temp in parallel
> schedule. Does that mean that these tests will be run simultaneously? If so,
> we have to make sure that they create tables/objects with different names
> such that those do not conflict with each other. Are we going to use second
> coordinator for firing the parallel testcase?
>
Those test cases run OK in parallel.


> 2. It may be better to separate the XC specific code into a separate C file
> pgxc_regress.c or something and call it in pg_regress.c. That way pg_regress
> will remain clean.

I am not sure it is the way to do.
In order to keep a code compact I made a lot of functions that use static
variables of pg_regress.c as the same operations are repeated several times.
If I put the additional functions in another file I will have to export
those variables or add additional arguments when calling for external APIs.
This may heavy the code more than light it up.
-- 
Michael Paquier
http://michael.otacoo.com

1 message has been excluded from this view by a project administrator.

Flat | Threaded

1 2 > >> (Page 1 of 2)