postgres-xc-developers Mailing List for Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-developers — Postgres-XC hackers and developers

You can subscribe to this list here.

2010	Jan	Feb	Mar	Apr (10)	May (17)	Jun (3)	Jul	Aug	Sep (8)	Oct (18)	Nov (51)	Dec (74)
2011	Jan (47)	Feb (44)	Mar (44)	Apr (102)	May (35)	Jun (25)	Jul (56)	Aug (69)	Sep (32)	Oct (37)	Nov (31)	Dec (16)
2012	Jan (34)	Feb (127)	Mar (218)	Apr (252)	May (80)	Jun (137)	Jul (205)	Aug (159)	Sep (35)	Oct (50)	Nov (82)	Dec (52)
2013	Jan (107)	Feb (159)	Mar (118)	Apr (163)	May (151)	Jun (89)	Jul (106)	Aug (177)	Sep (49)	Oct (63)	Nov (46)	Dec (7)
2014	Jan (65)	Feb (128)	Mar (40)	Apr (11)	May (4)	Jun (8)	Jul (16)	Aug (11)	Sep (4)	Oct (1)	Nov (5)	Dec (16)
2015	Jan (5)	Feb	Mar (2)	Apr (5)	May (4)	Jun (12)	Jul	Aug	Sep	Oct	Nov	Dec (4)
2019	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
					1 (1)	2
3	4	5 (3)	6	7 (9)	8 (13)	9
10 (2)	11 (1)	12 (4)	13 (8)	14 (7)	15 (14)	16
17	18 (16)	19 (11)	20 (7)	21 (8)	22	23
24	25	26 (9)	27 (12)	28 (8)	29 (4)	30

Flat | Threaded

[Postgres-xc-developers] xc_watchdog

From: Koichi S. <koi...@gm...> - 2012-06-20 06:42:20

To monitor if each XC component is running, psql is not sufficient
because it does not check gtm/gtm_proxy/datanode.   Also, psql
detection may take time.   As discussed in the cluster summit
(http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit),
watchdog time will be nice for this purpose.

Here's a design of watchdog timer:

1. Have separate shared memory for each component,
2. Postmaster and gtm/gtm_proxy server main loop increment each watchdog timer,
3. Timer will be detected by separate command to report any fault

For this purpose, need some GUC and GTM/GTM-Proxy configuration
parameters to specify
a. If watchdog time is on
b. Timer increment interval (maybe in milliseconds)

Shmid for each component will be kept in pg_control, gtm.control and
gtm_proxy.control files.

API to attach shared memory for watchdog and read the timer value will
be provided too.

Regards;
----------
Koichi Suzuki

Re: [Postgres-xc-developers] gtm_proxy crashes on startup + fix

From: Koichi S. <koi...@gm...> - 2012-06-20 06:21:32

Attachments: 32bitGtmProxy_20120620.patch

Fixed the bug.

The cause was incompatible definition of GTM_ThreadInfo and
GTMProxy_ThreadInfo. The third entry in GTM_ThreadInfo,
is_main_thread, is missing in GTMProxy_ThreadInfo, which caused
different offset of the following members in 32bit environment. In
64-bit environment, space for the missing member is padded.

Fixing patch is enclosed.

Regards;
----------
Koichi Suzuki


2012/6/20 Koichi Suzuki <koi...@gm...>:
> Somehow, this thread run in private, which is not good at all.
>
> Gtm_proxy crash in 32bit environment has been discussed between me and
> Plexo Rama.  Now I added this to the bug tracker with ID 3536469.
>
> So far, in 32bit environment, I found that thr_thread_context and
> thr_current_context are not set properly.  They're set to NULL.   On
> the other hand, in 64bit, all the thread information members are set
> to proper values.
>
> Now finding what made this difference.
>
> Regards;
> ----------
> Koichi Suzuki
>
>
> 2012/6/19 Koichi Suzuki <koi...@gm...>
>>
>> Hi,
>>
>> The problem is MemoryContextAllocZero receives NULL MemoryContext, which
>> shall be CurrentMemoryContext.   palloc0() does this and
>> CurrentMemoryContext should have been set in BaseInit().
>>
>> It's very straightforward and there're no struct involved and I need to
>> run it in the 32bit environment to see what is going on really.
>>
>> Anyway, this information saved much of my time.
>>
>> Thank you very much;
>> ----------
>> Koichi Suzuki
>>
>>
>>
>> 2012/6/18 plexo rama <ple...@gm...>
>>>
>>> Suzuki-san,
>>>
>>> this is the output of the backtrace command:
>>>
>>> #0  0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at
>>> mcxt.c:590
>>> #1  0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at
>>> proxy_thread.c:72
>>> #2  0x0804ac02 in BaseInit () at proxy_main.c:279
>>> #3  0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836
>>>
>>> Please note that the line 590 of mcxt.c maps to line 585 in the original
>>> mcxt.c (wich is distributed in the v1.0.0 archive).
>>> You must have an instance of GTM running before starting gtm_proxy,
>>> otherwise the segfault won't occur.
>>>
>>>
>>> Plexo
>>>
>>>
>>>
>>> 2012/6/18 Koichi Suzuki <koi...@gm...>
>>>>
>>>> I'm trying to fix this though it is only for 32bit.
>>>>
>>>> Because it may take a bit for me to build 32bit environment,  If you
>>>> have core file of GTM crash, it's very helpful if you send be back trace of
>>>> the core (bt command) by gdb.   I hope back trace will pinpoint the cause of
>>>> the bug.
>>>>
>>>> Best Regards;
>>>> ----------
>>>> Koichi Suzuki
>>>>
>>>>
>>>>
>>>> 2012/6/17 plexo rama <ple...@gm...>
>>>>>
>>>>> Suzuki-san,
>>>>>
>>>>> it seems the problem only occurs on 32bit systems.
>>>>>
>>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu
>>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on linode.com.
>>>>>
>>>>> The result was the same each try. I've tried it with the latest
>>>>> 1.0.0-beta package as well as v0.9.7.
>>>>>
>>>>>  When starting gtm_proxy on a 32bit system in gdb I always receive
>>>>>
>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at
>>>>> mcxt.c:585
>>>>> 585             ret = (*context->methods->alloc) (context, size);
>>>>>
>>>>>
>>>>> However, I've had success compiling & running the code / gtm_proxy on a
>>>>> 64bit system running Ubuntu 10.04.2 LTS 64bit.
>>>>>
>>>>> I hope that helps?
>>>>>
>>>>> I'm not sure whether it make sense to look further in to that issue as
>>>>> 32bit environments don't really make sense as a production system, except
>>>>> for testing.
>>>>> Although, it would be good to know, what the issue really is.
>>>>>
>>>>>
>>>>> Plexo
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2012/6/16 Koichi Suzuki <koi...@gm...>
>>>>>>
>>>>>> Plexo;
>>>>>>
>>>>>> Thanks a lot for the report.   I will look into it when back to Japan
>>>>>> (now I'm in Beijing).
>>>>>>
>>>>>> To reproduce the problems, could you let me know your configuration
>>>>>> (port and hosts of each component, including GTM, GTM_Proxy, Coordinator and
>>>>>> datanodes) and how to reproduce the problem?
>>>>>>
>>>>>> Also, could you send me bt of the core file?
>>>>>>
>>>>>> Best Regards;
>>>>>> ----------
>>>>>> Koichi Suzuki
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2012/6/16 plexo rama <ple...@gm...>
>>>>>>>
>>>>>>> Suzuki-san,
>>>>>>>
>>>>>>> I've received another segfault in gtm_proxy
>>>>>>>
>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)]
>>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320
>>>>>>> 320             EmitErrorReport(MyPort);
>>>>>>>
>>>>>>>
>>>>>>> it looks like elog.c also requires a special handling,
>>>>>>> as MyPort-macro uses GetMyThreadInfo and thus refers
>>>>>>> to GTM_ThreadInfo instead of GTMProxy_ThreadInfo
>>>>>>>
>>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...>
>>>>>>>>
>>>>>>>> Yes, I will do it.
>>>>>>>>
>>>>>>>> ---
>>>>>>>> Koichi Suzuki
>>>>>>>>
>>>>>>>> On 2012/06/15, at 8:07, Michael Paquier <mic...@gm...>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki
>>>>>>>> <koi...@gm...> wrote:
>>>>>>>>>
>>>>>>>>> Thanks for the pointing out.   Another requirement was to make
>>>>>>>>> mcxt.o
>>>>>>>>> shared among gtm and gtm_proxy.   I will look check if this
>>>>>>>>> requirement makes sense now (it did, when very first pgxc_clean was
>>>>>>>>> implemented, which was rewritten at V 1.0).
>>>>>>>>>
>>>>>>>>> I'm afraid the cause of NULL pointer is different.
>>>>>>>>
>>>>>>>> Suzuki-san,
>>>>>>>>
>>>>>>>> Could you sort that out with plexo and review any patch he sends?
>>>>>>>> You know this area of the code pretty well so I believe you are
>>>>>>>> well-suited here.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> --
>>>>>>>> Michael Paquier
>>>>>>>> http://michael.otacoo.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: [Postgres-xc-developers] gtm_proxy crashes on startup + fix

From: Koichi S. <koi...@gm...> - 2012-06-20 03:12:30

Somehow, this thread run in private, which is not good at all.

Gtm_proxy crash in 32bit environment has been discussed between me and
Plexo Rama.  Now I added this to the bug tracker with ID 3536469.

So far, in 32bit environment, I found that thr_thread_context and
thr_current_context are not set properly.  They're set to NULL.   On
the other hand, in 64bit, all the thread information members are set
to proper values.

Now finding what made this difference.

Regards;
----------
Koichi Suzuki


2012/6/19 Koichi Suzuki <koi...@gm...>
>
> Hi,
>
> The problem is MemoryContextAllocZero receives NULL MemoryContext, which
> shall be CurrentMemoryContext.   palloc0() does this and
> CurrentMemoryContext should have been set in BaseInit().
>
> It's very straightforward and there're no struct involved and I need to
> run it in the 32bit environment to see what is going on really.
>
> Anyway, this information saved much of my time.
>
> Thank you very much;
> ----------
> Koichi Suzuki
>
>
>
> 2012/6/18 plexo rama <ple...@gm...>
>>
>> Suzuki-san,
>>
>> this is the output of the backtrace command:
>>
>> #0  0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at
>> mcxt.c:590
>> #1  0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at
>> proxy_thread.c:72
>> #2  0x0804ac02 in BaseInit () at proxy_main.c:279
>> #3  0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836
>>
>> Please note that the line 590 of mcxt.c maps to line 585 in the original
>> mcxt.c (wich is distributed in the v1.0.0 archive).
>> You must have an instance of GTM running before starting gtm_proxy,
>> otherwise the segfault won't occur.
>>
>>
>> Plexo
>>
>>
>>
>> 2012/6/18 Koichi Suzuki <koi...@gm...>
>>>
>>> I'm trying to fix this though it is only for 32bit.
>>>
>>> Because it may take a bit for me to build 32bit environment,  If you
>>> have core file of GTM crash, it's very helpful if you send be back trace of
>>> the core (bt command) by gdb.   I hope back trace will pinpoint the cause of
>>> the bug.
>>>
>>> Best Regards;
>>> ----------
>>> Koichi Suzuki
>>>
>>>
>>>
>>> 2012/6/17 plexo rama <ple...@gm...>
>>>>
>>>> Suzuki-san,
>>>>
>>>> it seems the problem only occurs on 32bit systems.
>>>>
>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu
>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on linode.com.
>>>>
>>>> The result was the same each try. I've tried it with the latest
>>>> 1.0.0-beta package as well as v0.9.7.
>>>>
>>>>  When starting gtm_proxy on a 32bit system in gdb I always receive
>>>>
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at
>>>> mcxt.c:585
>>>> 585             ret = (*context->methods->alloc) (context, size);
>>>>
>>>>
>>>> However, I've had success compiling & running the code / gtm_proxy on a
>>>> 64bit system running Ubuntu 10.04.2 LTS 64bit.
>>>>
>>>> I hope that helps?
>>>>
>>>> I'm not sure whether it make sense to look further in to that issue as
>>>> 32bit environments don't really make sense as a production system, except
>>>> for testing.
>>>> Although, it would be good to know, what the issue really is.
>>>>
>>>>
>>>> Plexo
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2012/6/16 Koichi Suzuki <koi...@gm...>
>>>>>
>>>>> Plexo;
>>>>>
>>>>> Thanks a lot for the report.   I will look into it when back to Japan
>>>>> (now I'm in Beijing).
>>>>>
>>>>> To reproduce the problems, could you let me know your configuration
>>>>> (port and hosts of each component, including GTM, GTM_Proxy, Coordinator and
>>>>> datanodes) and how to reproduce the problem?
>>>>>
>>>>> Also, could you send me bt of the core file?
>>>>>
>>>>> Best Regards;
>>>>> ----------
>>>>> Koichi Suzuki
>>>>>
>>>>>
>>>>>
>>>>> 2012/6/16 plexo rama <ple...@gm...>
>>>>>>
>>>>>> Suzuki-san,
>>>>>>
>>>>>> I've received another segfault in gtm_proxy
>>>>>>
>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)]
>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320
>>>>>> 320             EmitErrorReport(MyPort);
>>>>>>
>>>>>>
>>>>>> it looks like elog.c also requires a special handling,
>>>>>> as MyPort-macro uses GetMyThreadInfo and thus refers
>>>>>> to GTM_ThreadInfo instead of GTMProxy_ThreadInfo
>>>>>>
>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...>
>>>>>>>
>>>>>>> Yes, I will do it.
>>>>>>>
>>>>>>> ---
>>>>>>> Koichi Suzuki
>>>>>>>
>>>>>>> On 2012/06/15, at 8:07, Michael Paquier <mic...@gm...>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki
>>>>>>> <koi...@gm...> wrote:
>>>>>>>>
>>>>>>>> Thanks for the pointing out.   Another requirement was to make
>>>>>>>> mcxt.o
>>>>>>>> shared among gtm and gtm_proxy.   I will look check if this
>>>>>>>> requirement makes sense now (it did, when very first pgxc_clean was
>>>>>>>> implemented, which was rewritten at V 1.0).
>>>>>>>>
>>>>>>>> I'm afraid the cause of NULL pointer is different.
>>>>>>>
>>>>>>> Suzuki-san,
>>>>>>>
>>>>>>> Could you sort that out with plexo and review any patch he sends?
>>>>>>> You know this area of the code pretty well so I believe you are
>>>>>>> well-suited here.
>>>>>>>
>>>>>>> Regards,
>>>>>>> --
>>>>>>> Michael Paquier
>>>>>>> http://michael.otacoo.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [Postgres-xc-developers] gtm_proxy crashes on startup + fix

From: Michael P. <mic...@gm...> - 2012-06-20 00:58:26

Hi,

Just a comment on this thread...
You are not forwarding messages to the hackers ML since a couple of emails.
So I am putting that back on tracks to allow people seing the message
history.
They are included in this message below.
Thanks.

On Tue, Jun 19, 2012 at 5:30 PM, Koichi Suzuki <koi...@gm...>wrote:

> Okay.  I'm building my 32-bit environment and will look into it.  Sorry I
> have to find some time slot for this...
> ----------
> Koichi Suzuki
>
>
>
> 2012/6/19 plexo rama <ple...@gm...>
>
>> Suzuki-San,
>>
>> I guess you didn't read my initial mail from 14th June, which states
>> exactly the same analysis. However, as CurrentMemoryContext is involved,
>> which is a MACRO and thus resolving to GetMyThreadInfo, which in turn
>> references GTM_ThreadInfo, there is a structure involved in this.
>>
>> Actually, CurrentMemoryContext used by palloc0() returns
>> 0x0 CurrentMemoryContext as thr_current_context member of GTM_ThreadInfo
>> contains 0x0, however, examinig the structure returned by GetMyThreadInfo,
>> reveals that thr_message_context contains a value after BaseInit() /
>> MemoryContextInit() is called, which should not be because
>> thr_message_context is not initialized in MemoryContextInit(). Thus my
>> conclusion: the structure used by CurrentMemoryContext, TopMemoryContext &
>> ErrorContext used in MemoryContextInit() points to the wrong memory
>> location (off-by-1 so to say).
>>
>> I'm glad I could help to sched some light on this.
>>
>> Plexo
>>
>>
>>
>> 2012/6/19 Koichi Suzuki <koi...@gm...>
>>
>>> Hi,
>>>
>>> The problem is MemoryContextAllocZero receives NULL MemoryContext, which
>>> shall be CurrentMemoryContext.   palloc0() does this and
>>> CurrentMemoryContext should have been set in BaseInit().
>>>
>>> It's very straightforward and there're no struct involved and I need to
>>> run it in the 32bit environment to see what is going on really.
>>>
>>> Anyway, this information saved much of my time.
>>>
>>> Thank you very much;
>>> ----------
>>> Koichi Suzuki
>>>
>>>
>>>
>>> 2012/6/18 plexo rama <ple...@gm...>
>>>
>>>> Suzuki-san,
>>>>
>>>> this is the output of the backtrace command:
>>>> *
>>>> *
>>>> *#0  0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at
>>>> mcxt.c:590*
>>>> *#1  0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at
>>>> proxy_thread.c:72*
>>>> *#2  0x0804ac02 in BaseInit () at proxy_main.c:279*
>>>> *#3  0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836*
>>>>
>>>> Please note that the line *590 of mcxt.c maps to line 585 in the
>>>> original mcxt.c* (wich is distributed in the v1.0.0 archive).
>>>> You must have an instance of GTM running before starting gtm_proxy,
>>>> otherwise the segfault won't occur.
>>>>
>>>>
>>>> Plexo
>>>>
>>>>
>>>>
>>>> 2012/6/18 Koichi Suzuki <koi...@gm...>
>>>>
>>>>> I'm trying to fix this though it is only for 32bit.
>>>>>
>>>>> Because it may take a bit for me to build 32bit environment,  If you
>>>>> have core file of GTM crash, it's very helpful if you send be back trace of
>>>>> the core (bt command) by gdb.   I hope back trace will pinpoint the cause
>>>>> of the bug.
>>>>>
>>>>> Best Regards;
>>>>> ----------
>>>>> Koichi Suzuki
>>>>>
>>>>>
>>>>>
>>>>> 2012/6/17 plexo rama <ple...@gm...>
>>>>>
>>>>>> Suzuki-san,
>>>>>>
>>>>>> it seems the problem only occurs on 32bit systems.
>>>>>>
>>>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu
>>>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on
>>>>>> linode.com.
>>>>>>
>>>>>> The result was the same each try. I've tried it with the latest
>>>>>> 1.0.0-beta package as well as v0.9.7.
>>>>>>
>>>>>>  When starting gtm_proxy on a 32bit system in gdb I always receive
>>>>>>
>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at
>>>>>> mcxt.c:585
>>>>>>  585             ret = (*context->methods->alloc) (context, size);
>>>>>>
>>>>>>
>>>>>> However, I've had success compiling & running the code / gtm_proxy on
>>>>>> a 64bit system running Ubuntu 10.04.2 LTS 64bit.
>>>>>>
>>>>>> I hope that helps?
>>>>>>
>>>>>> I'm not sure whether it make sense to look further in to that issue
>>>>>> as 32bit environments don't really make sense as a production system,
>>>>>> except for testing.
>>>>>> Although, it would be good to know, what the issue really is.
>>>>>>
>>>>>>
>>>>>> Plexo
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2012/6/16 Koichi Suzuki <koi...@gm...>
>>>>>>
>>>>>>> Plexo;
>>>>>>>
>>>>>>> Thanks a lot for the report.   I will look into it when back to
>>>>>>> Japan (now I'm in Beijing).
>>>>>>>
>>>>>>> To reproduce the problems, could you let me know your configuration
>>>>>>> (port and hosts of each component, including GTM, GTM_Proxy, Coordinator
>>>>>>> and datanodes) and how to reproduce the problem?
>>>>>>>
>>>>>>> Also, could you send me bt of the core file?
>>>>>>>
>>>>>>> Best Regards;
>>>>>>> ----------
>>>>>>> Koichi Suzuki
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2012/6/16 plexo rama <ple...@gm...>
>>>>>>>
>>>>>>>> Suzuki-san,
>>>>>>>>
>>>>>>>> I've received another segfault in gtm_proxy
>>>>>>>>
>>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)]
>>>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320
>>>>>>>> 320             EmitErrorReport(MyPort);
>>>>>>>>
>>>>>>>>
>>>>>>>> it looks like elog.c also requires a special handling,
>>>>>>>> as *MyPort-macro* uses *GetMyThreadInfo *and thus refers
>>>>>>>> to GTM_ThreadInfo* instead of GTMProxy_ThreadInfo*
>>>>>>>>
>>>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...>
>>>>>>>>
>>>>>>>>> Yes, I will do it.
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> Koichi Suzuki
>>>>>>>>>
>>>>>>>>> On 2012/06/15, at 8:07, Michael Paquier <mic...@gm...>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki <
>>>>>>>>> koi...@gm...> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the pointing out.   Another requirement was to make
>>>>>>>>>> mcxt.o
>>>>>>>>>> shared among gtm and gtm_proxy.   I will look check if this
>>>>>>>>>> requirement makes sense now (it did, when very first pgxc_clean
>>>>>>>>>> was
>>>>>>>>>> implemented, which was rewritten at V 1.0).
>>>>>>>>>>
>>>>>>>>>> I'm afraid the cause of NULL pointer is different.
>>>>>>>>>>
>>>>>>>>> Suzuki-san,
>>>>>>>>>
>>>>>>>>> Could you sort that out with plexo and review any patch he sends?
>>>>>>>>> You know this area of the code pretty well so I believe you are
>>>>>>>>> well-suited here.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> --
>>>>>>>>> Michael Paquier
>>>>>>>>> http://michael.otacoo.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] ALTER TABLE redistribution: SQL-side patch and next step

From: Michael P. <mic...@gm...> - 2012-06-20 00:29:46

In this patch, the SQL/catalog management and the distribution mechanism
use really separated APIs.
So even if I do not think it is necessary to change the SQL part, the
redistribution mechanism can be changed at will.

For the time being, the redistribution mechanism is not really performant
(well, it was not the goal of this prototype), because it uses the
following model.
1) Creation of a storage table (unlogged) with default distribution (CTAS)
2) Take necessary locks on storage and redistributed table
3) Delete all data on redistributed table
4) Update catalogs with new distribution information
5) Perform INSERT SELECT from storage table to redistributed table
6) DROP storage table
As mentionned by Ashutosh, the data needs to travel 4 times through network
so it can really take a lot of time for tables with lots of gigs of data.
Network in itself is not the bottleneck, it is the usage of the framework
of postgres. This is especially true when queries used by redistribution
mechanism cannot be pushed down.
The worst case being when a table is redistributed to a hash/modulo on
multiple nodes, as in this case it is necessary to plan each INSERT for the
redistribution. As mentionned also by Ashutosh this can create a huge deal
of xlogs on remote Datanodes, not really welcome after a crash recovery.

There are several ways possible to improve this dramatically improve the
redistribution mechanism.
Here are 3 ideas.
1) Create storage table with data on a single node
Here we reduce the load on network, but it cannot solve the problem of
tables redistributed to modulo/hash on multiple nodes. It will create a lot
of INSERT queries for a slow result.
2) Use a COPY mechanism
One of the simple solutions. Instead of using a costly storage table in
cluster, store the data on Coordinator during the redistribution
a) COPY the data of table being redistributed to a file in $PGDATA of
Coordinator. Why not $PGDATA/pg_distrib/oid?
b) DELETE all the data on table being redistributed
c) update catalogs
d) COPY FROM file to table with new distribution type
Network load is halved. COPY is also really faster.
Servers of Coordinator are not chosen for there disk I/O but the folder
$PGDATA/pg_distrib could be linked to a folder where a faster disk is
mounted
This also gets rid of the storage table. The only thing to care of is the
deletion of the temporary data file once redistribution transaction commits
or aborts.
Data file could also be compressed to reduce space consumed and I/O on disk.

3) Use a batching process to communicate only necessary tuples from
Datanodes to Coordinator.
Suggested by Ashutosh, this can use COPY protocol to redistribute in a
batch way the tuples being redistributed.
The idea is to send from Datanodes to Coordinator only the tuples that need
to be redistributed, and then let Coordinator redistribute correctly all
the data depending on the new distribution. This avoids to have to store
temporarily the data redistributed and all the transfer is managed by cache
on Coordinator.
This idea has a couple of limitations though:
- a Datanode is not aware of the existence of the other nodes in cluster.
Now distribution data is only available at Coordinator on catalog
pgxc_class, and this distribution data contains the list of nodes where
data is distributed. This is directly dependant on catalog pgxc_node. So a
Datanode cannot know if a tuple will be at the correct place or not.
This could be countered by allowing the run of node DDLs on Datanodes, but
this adds an additional constraint on cluster setting as it forces the
cluster designer to update all the pgxc_node catalogs on all the nodes.
Having a pgxc_node catalog on Datanode would make sense if it communicates
with other nodes through the same pooler as Coordinator, but this also
raises issues with multiple backends open on one node for the same session,
which is dangerous for transaction handling.
- visibility concerns. What insures that a tuple has been only selected
once. As redistribution is a cluster-based mechanism. What can insure that
a scan on a Datanode is not taking into account some tuples that have
already been redistributed.

Method 1 looks useless from the point of performance.
Method 2 should have a good performance. This only point is that data has
to be located on Coordinator server temporarily while redistribution is
being done. We could also use some GUC parameter to allow DBA to customize
the way redistribution data folder is stored (compression type, file name
format...).
I have some concerns about method 3 as explained above. I might not take
into account all the potential problems or have a limited view on this
mechanism, but it introduces some new dependencies with cluster setting
which may not be necessary. However any discussion on the subject is
welcome.

Suggestions are welcome.

On Wed, Jun 20, 2012 at 8:40 AM, Michael Paquier
<mic...@gm...>wrote:

>
>
> On Wed, Jun 20, 2012 at 4:19 AM, Abbas Butt <abb...@en...>wrote:
>
>> You forgot to attach the patch.
>>
> Sorry here is the patch.
>
>
>
>>
>> On Tue, Jun 19, 2012 at 10:58 AM, Michael Paquier <
>> mic...@gm...> wrote:
>>
>>> Hi all,
>>>
>>> Please find attached an improved patch. I corrected the following points:
>>> - Storage table uses an access exclusive lock meaning it cannot be
>>> accessed by other sessions in cluster
>>> - The table redistributed uses an exclusive lock, it can be accessed by
>>> the other sessions in cluster with SELECT while redistribution is running
>>> - Addition of an API to manage table locking
>>> - Correction of bugs regarding session concurrency. An update in
>>> pgxc_class (update of distribution data) was not seen by concurrent
>>> sessions in cluster.
>>> - doc correction and completion
>>> - regression fixes due to grammar change for node list in CTAS, CREATE
>>> TABLE, EXECUTE DIRECT and CLEAN CONNECTION
>>> - Fix of system functions using EXECUTE direct
>>> - Fix for CTAS query generation
>>> - update index of catalog pgxc_class updated
>>> - Correct update for relation cache when location data is updated
>>>
>>> Questions are welcome.
>>> This patch can be applied on master and works as expected.
>>>
>>> On Mon, Jun 18, 2012 at 5:25 PM, Michael Paquier <
>>> mic...@gm...> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Based on the design above, I went to the end of my idea and took a day
>>>> to write a prototype for online redistribution based on ALTER TABLE.
>>>> It uses the grammar written in previous mail with ADD NODE/DELETE
>>>> NODE/DISTRIBUTE BY/TO NODE | GROUP.
>>>>
>>>> The main idea is the use of what I call a "storage" table which is used
>>>> as a temporary location for the data being distributed in cluster.
>>>> This table is created as unlogged
>>>>
>>>> The patch sticks with the design invocated before;
>>>> - Cached plans are dropped when redistribution is invocated
>>>> - Vacuum is not necessary, this mechanism uses transaction-safe queries
>>>> - for the time being, this implementation uses an exclusive lock, but
>>>> as the redistribution is done, a ShareUpdateExclusive lock is not to
>>>> exclude.
>>>> - tables are reindexed if necessary.
>>>> - redistribution cannot be done inside a transaction block
>>>> - redistribution is not authorized with all the other commands as they
>>>> are locally-safe on each node.
>>>> - no restrictions on the distribution types, table types or subclusters
>>>>
>>>> This feature can be really improved for example in the case of
>>>> replicated tables in particular, when the list of nodes of the table is
>>>> changed.
>>>> It is one of the things I would like to improve as it would really
>>>> increase performance
>>>>
>>>> Regards,
>>>>
>>>> --
>>>> Michael Paquier
>>>> http://michael.otacoo.com
>>>>
>>>
>>>
>>>
>>> --
>>> Michael Paquier
>>> http://michael.otacoo.com
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Postgres-xc-developers mailing list
>>> Pos...@li...
>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>
>>>
>>
>>
>> --
>> --
>> Abbas
>> Architect
>> EnterpriseDB Corporation
>> The Enterprise PostgreSQL Company
>>
>> Phone: 92-334-5100153
>>
>>
>> Website: www.enterprisedb.com
>> EnterpriseDB Blog: http://blogs.enterprisedb.com/
>> Follow us on Twitter: http://www.twitter.com/enterprisedb
>>
>> This e-mail message (and any attachment) is intended for the use of
>> the individual or entity to whom it is addressed. This message
>> contains information from EnterpriseDB Corporation that may be
>> privileged, confidential, or exempt from disclosure under applicable
>> law. If you are not the intended recipient or authorized to receive
>> this for the intended recipient, any use, dissemination, distribution,
>> retention, archiving, or copying of this communication is strictly
>> prohibited. If you have received this e-mail in error, please notify
>> the sender immediately by reply e-mail and delete this message.
>>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com
>

-- 
Michael Paquier
http://michael.otacoo.com

2 messages has been excluded from this view by a project administrator.

Flat | Threaded

S	M	T	W	T	F	S
					1 (1)	2
3	4	5 (3)	6	7 (9)	8 (13)	9
10 (2)	11 (1)	12 (4)	13 (8)	14 (7)	15 (14)	16
17	18 (16)	19 (11)	20 (7)	21 (8)	22	23
24	25	26 (9)	27 (12)	28 (8)	29 (4)	30