You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
|
1
(1) |
2
|
3
|
4
|
5
(3) |
6
|
7
(9) |
8
(13) |
9
|
10
(2) |
11
(1) |
12
(4) |
13
(8) |
14
(7) |
15
(14) |
16
|
17
|
18
(16) |
19
(11) |
20
(7) |
21
(8) |
22
|
23
|
24
|
25
|
26
(9) |
27
(12) |
28
(8) |
29
(4) |
30
|
From: Koichi S. <koi...@gm...> - 2012-06-20 06:42:20
|
To monitor if each XC component is running, psql is not sufficient because it does not check gtm/gtm_proxy/datanode. Also, psql detection may take time. As discussed in the cluster summit (http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit), watchdog time will be nice for this purpose. Here's a design of watchdog timer: 1. Have separate shared memory for each component, 2. Postmaster and gtm/gtm_proxy server main loop increment each watchdog timer, 3. Timer will be detected by separate command to report any fault For this purpose, need some GUC and GTM/GTM-Proxy configuration parameters to specify a. If watchdog time is on b. Timer increment interval (maybe in milliseconds) Shmid for each component will be kept in pg_control, gtm.control and gtm_proxy.control files. API to attach shared memory for watchdog and read the timer value will be provided too. Regards; ---------- Koichi Suzuki |
From: Koichi S. <koi...@gm...> - 2012-06-20 06:21:32
|
Fixed the bug. The cause was incompatible definition of GTM_ThreadInfo and GTMProxy_ThreadInfo. The third entry in GTM_ThreadInfo, is_main_thread, is missing in GTMProxy_ThreadInfo, which caused different offset of the following members in 32bit environment. In 64-bit environment, space for the missing member is padded. Fixing patch is enclosed. Regards; ---------- Koichi Suzuki 2012/6/20 Koichi Suzuki <koi...@gm...>: > Somehow, this thread run in private, which is not good at all. > > Gtm_proxy crash in 32bit environment has been discussed between me and > Plexo Rama. Now I added this to the bug tracker with ID 3536469. > > So far, in 32bit environment, I found that thr_thread_context and > thr_current_context are not set properly. They're set to NULL. On > the other hand, in 64bit, all the thread information members are set > to proper values. > > Now finding what made this difference. > > Regards; > ---------- > Koichi Suzuki > > > 2012/6/19 Koichi Suzuki <koi...@gm...> >> >> Hi, >> >> The problem is MemoryContextAllocZero receives NULL MemoryContext, which >> shall be CurrentMemoryContext. palloc0() does this and >> CurrentMemoryContext should have been set in BaseInit(). >> >> It's very straightforward and there're no struct involved and I need to >> run it in the 32bit environment to see what is going on really. >> >> Anyway, this information saved much of my time. >> >> Thank you very much; >> ---------- >> Koichi Suzuki >> >> >> >> 2012/6/18 plexo rama <ple...@gm...> >>> >>> Suzuki-san, >>> >>> this is the output of the backtrace command: >>> >>> #0 0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at >>> mcxt.c:590 >>> #1 0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at >>> proxy_thread.c:72 >>> #2 0x0804ac02 in BaseInit () at proxy_main.c:279 >>> #3 0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836 >>> >>> Please note that the line 590 of mcxt.c maps to line 585 in the original >>> mcxt.c (wich is distributed in the v1.0.0 archive). >>> You must have an instance of GTM running before starting gtm_proxy, >>> otherwise the segfault won't occur. >>> >>> >>> Plexo >>> >>> >>> >>> 2012/6/18 Koichi Suzuki <koi...@gm...> >>>> >>>> I'm trying to fix this though it is only for 32bit. >>>> >>>> Because it may take a bit for me to build 32bit environment, If you >>>> have core file of GTM crash, it's very helpful if you send be back trace of >>>> the core (bt command) by gdb. I hope back trace will pinpoint the cause of >>>> the bug. >>>> >>>> Best Regards; >>>> ---------- >>>> Koichi Suzuki >>>> >>>> >>>> >>>> 2012/6/17 plexo rama <ple...@gm...> >>>>> >>>>> Suzuki-san, >>>>> >>>>> it seems the problem only occurs on 32bit systems. >>>>> >>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu >>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on linode.com. >>>>> >>>>> The result was the same each try. I've tried it with the latest >>>>> 1.0.0-beta package as well as v0.9.7. >>>>> >>>>> When starting gtm_proxy on a 32bit system in gdb I always receive >>>>> >>>>> Program received signal SIGSEGV, Segmentation fault. >>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at >>>>> mcxt.c:585 >>>>> 585 ret = (*context->methods->alloc) (context, size); >>>>> >>>>> >>>>> However, I've had success compiling & running the code / gtm_proxy on a >>>>> 64bit system running Ubuntu 10.04.2 LTS 64bit. >>>>> >>>>> I hope that helps? >>>>> >>>>> I'm not sure whether it make sense to look further in to that issue as >>>>> 32bit environments don't really make sense as a production system, except >>>>> for testing. >>>>> Although, it would be good to know, what the issue really is. >>>>> >>>>> >>>>> Plexo >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 2012/6/16 Koichi Suzuki <koi...@gm...> >>>>>> >>>>>> Plexo; >>>>>> >>>>>> Thanks a lot for the report. I will look into it when back to Japan >>>>>> (now I'm in Beijing). >>>>>> >>>>>> To reproduce the problems, could you let me know your configuration >>>>>> (port and hosts of each component, including GTM, GTM_Proxy, Coordinator and >>>>>> datanodes) and how to reproduce the problem? >>>>>> >>>>>> Also, could you send me bt of the core file? >>>>>> >>>>>> Best Regards; >>>>>> ---------- >>>>>> Koichi Suzuki >>>>>> >>>>>> >>>>>> >>>>>> 2012/6/16 plexo rama <ple...@gm...> >>>>>>> >>>>>>> Suzuki-san, >>>>>>> >>>>>>> I've received another segfault in gtm_proxy >>>>>>> >>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)] >>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320 >>>>>>> 320 EmitErrorReport(MyPort); >>>>>>> >>>>>>> >>>>>>> it looks like elog.c also requires a special handling, >>>>>>> as MyPort-macro uses GetMyThreadInfo and thus refers >>>>>>> to GTM_ThreadInfo instead of GTMProxy_ThreadInfo >>>>>>> >>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...> >>>>>>>> >>>>>>>> Yes, I will do it. >>>>>>>> >>>>>>>> --- >>>>>>>> Koichi Suzuki >>>>>>>> >>>>>>>> On 2012/06/15, at 8:07, Michael Paquier <mic...@gm...> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki >>>>>>>> <koi...@gm...> wrote: >>>>>>>>> >>>>>>>>> Thanks for the pointing out. Another requirement was to make >>>>>>>>> mcxt.o >>>>>>>>> shared among gtm and gtm_proxy. I will look check if this >>>>>>>>> requirement makes sense now (it did, when very first pgxc_clean was >>>>>>>>> implemented, which was rewritten at V 1.0). >>>>>>>>> >>>>>>>>> I'm afraid the cause of NULL pointer is different. >>>>>>>> >>>>>>>> Suzuki-san, >>>>>>>> >>>>>>>> Could you sort that out with plexo and review any patch he sends? >>>>>>>> You know this area of the code pretty well so I believe you are >>>>>>>> well-suited here. >>>>>>>> >>>>>>>> Regards, >>>>>>>> -- >>>>>>>> Michael Paquier >>>>>>>> http://michael.otacoo.com >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> |
From: Koichi S. <koi...@gm...> - 2012-06-20 03:12:30
|
Somehow, this thread run in private, which is not good at all. Gtm_proxy crash in 32bit environment has been discussed between me and Plexo Rama. Now I added this to the bug tracker with ID 3536469. So far, in 32bit environment, I found that thr_thread_context and thr_current_context are not set properly. They're set to NULL. On the other hand, in 64bit, all the thread information members are set to proper values. Now finding what made this difference. Regards; ---------- Koichi Suzuki 2012/6/19 Koichi Suzuki <koi...@gm...> > > Hi, > > The problem is MemoryContextAllocZero receives NULL MemoryContext, which > shall be CurrentMemoryContext. palloc0() does this and > CurrentMemoryContext should have been set in BaseInit(). > > It's very straightforward and there're no struct involved and I need to > run it in the 32bit environment to see what is going on really. > > Anyway, this information saved much of my time. > > Thank you very much; > ---------- > Koichi Suzuki > > > > 2012/6/18 plexo rama <ple...@gm...> >> >> Suzuki-san, >> >> this is the output of the backtrace command: >> >> #0 0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at >> mcxt.c:590 >> #1 0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at >> proxy_thread.c:72 >> #2 0x0804ac02 in BaseInit () at proxy_main.c:279 >> #3 0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836 >> >> Please note that the line 590 of mcxt.c maps to line 585 in the original >> mcxt.c (wich is distributed in the v1.0.0 archive). >> You must have an instance of GTM running before starting gtm_proxy, >> otherwise the segfault won't occur. >> >> >> Plexo >> >> >> >> 2012/6/18 Koichi Suzuki <koi...@gm...> >>> >>> I'm trying to fix this though it is only for 32bit. >>> >>> Because it may take a bit for me to build 32bit environment, If you >>> have core file of GTM crash, it's very helpful if you send be back trace of >>> the core (bt command) by gdb. I hope back trace will pinpoint the cause of >>> the bug. >>> >>> Best Regards; >>> ---------- >>> Koichi Suzuki >>> >>> >>> >>> 2012/6/17 plexo rama <ple...@gm...> >>>> >>>> Suzuki-san, >>>> >>>> it seems the problem only occurs on 32bit systems. >>>> >>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu >>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on linode.com. >>>> >>>> The result was the same each try. I've tried it with the latest >>>> 1.0.0-beta package as well as v0.9.7. >>>> >>>> When starting gtm_proxy on a 32bit system in gdb I always receive >>>> >>>> Program received signal SIGSEGV, Segmentation fault. >>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at >>>> mcxt.c:585 >>>> 585 ret = (*context->methods->alloc) (context, size); >>>> >>>> >>>> However, I've had success compiling & running the code / gtm_proxy on a >>>> 64bit system running Ubuntu 10.04.2 LTS 64bit. >>>> >>>> I hope that helps? >>>> >>>> I'm not sure whether it make sense to look further in to that issue as >>>> 32bit environments don't really make sense as a production system, except >>>> for testing. >>>> Although, it would be good to know, what the issue really is. >>>> >>>> >>>> Plexo >>>> >>>> >>>> >>>> >>>> >>>> 2012/6/16 Koichi Suzuki <koi...@gm...> >>>>> >>>>> Plexo; >>>>> >>>>> Thanks a lot for the report. I will look into it when back to Japan >>>>> (now I'm in Beijing). >>>>> >>>>> To reproduce the problems, could you let me know your configuration >>>>> (port and hosts of each component, including GTM, GTM_Proxy, Coordinator and >>>>> datanodes) and how to reproduce the problem? >>>>> >>>>> Also, could you send me bt of the core file? >>>>> >>>>> Best Regards; >>>>> ---------- >>>>> Koichi Suzuki >>>>> >>>>> >>>>> >>>>> 2012/6/16 plexo rama <ple...@gm...> >>>>>> >>>>>> Suzuki-san, >>>>>> >>>>>> I've received another segfault in gtm_proxy >>>>>> >>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)] >>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320 >>>>>> 320 EmitErrorReport(MyPort); >>>>>> >>>>>> >>>>>> it looks like elog.c also requires a special handling, >>>>>> as MyPort-macro uses GetMyThreadInfo and thus refers >>>>>> to GTM_ThreadInfo instead of GTMProxy_ThreadInfo >>>>>> >>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...> >>>>>>> >>>>>>> Yes, I will do it. >>>>>>> >>>>>>> --- >>>>>>> Koichi Suzuki >>>>>>> >>>>>>> On 2012/06/15, at 8:07, Michael Paquier <mic...@gm...> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki >>>>>>> <koi...@gm...> wrote: >>>>>>>> >>>>>>>> Thanks for the pointing out. Another requirement was to make >>>>>>>> mcxt.o >>>>>>>> shared among gtm and gtm_proxy. I will look check if this >>>>>>>> requirement makes sense now (it did, when very first pgxc_clean was >>>>>>>> implemented, which was rewritten at V 1.0). >>>>>>>> >>>>>>>> I'm afraid the cause of NULL pointer is different. >>>>>>> >>>>>>> Suzuki-san, >>>>>>> >>>>>>> Could you sort that out with plexo and review any patch he sends? >>>>>>> You know this area of the code pretty well so I believe you are >>>>>>> well-suited here. >>>>>>> >>>>>>> Regards, >>>>>>> -- >>>>>>> Michael Paquier >>>>>>> http://michael.otacoo.com >>>>>> >>>>>> >>>>> >>>> >>> >> > |
From: Michael P. <mic...@gm...> - 2012-06-20 00:58:26
|
Hi, Just a comment on this thread... You are not forwarding messages to the hackers ML since a couple of emails. So I am putting that back on tracks to allow people seing the message history. They are included in this message below. Thanks. On Tue, Jun 19, 2012 at 5:30 PM, Koichi Suzuki <koi...@gm...>wrote: > Okay. I'm building my 32-bit environment and will look into it. Sorry I > have to find some time slot for this... > ---------- > Koichi Suzuki > > > > 2012/6/19 plexo rama <ple...@gm...> > >> Suzuki-San, >> >> I guess you didn't read my initial mail from 14th June, which states >> exactly the same analysis. However, as CurrentMemoryContext is involved, >> which is a MACRO and thus resolving to GetMyThreadInfo, which in turn >> references GTM_ThreadInfo, there is a structure involved in this. >> >> Actually, CurrentMemoryContext used by palloc0() returns >> 0x0 CurrentMemoryContext as thr_current_context member of GTM_ThreadInfo >> contains 0x0, however, examinig the structure returned by GetMyThreadInfo, >> reveals that thr_message_context contains a value after BaseInit() / >> MemoryContextInit() is called, which should not be because >> thr_message_context is not initialized in MemoryContextInit(). Thus my >> conclusion: the structure used by CurrentMemoryContext, TopMemoryContext & >> ErrorContext used in MemoryContextInit() points to the wrong memory >> location (off-by-1 so to say). >> >> I'm glad I could help to sched some light on this. >> >> Plexo >> >> >> >> 2012/6/19 Koichi Suzuki <koi...@gm...> >> >>> Hi, >>> >>> The problem is MemoryContextAllocZero receives NULL MemoryContext, which >>> shall be CurrentMemoryContext. palloc0() does this and >>> CurrentMemoryContext should have been set in BaseInit(). >>> >>> It's very straightforward and there're no struct involved and I need to >>> run it in the 32bit environment to see what is going on really. >>> >>> Anyway, this information saved much of my time. >>> >>> Thank you very much; >>> ---------- >>> Koichi Suzuki >>> >>> >>> >>> 2012/6/18 plexo rama <ple...@gm...> >>> >>>> Suzuki-san, >>>> >>>> this is the output of the backtrace command: >>>> * >>>> * >>>> *#0 0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at >>>> mcxt.c:590* >>>> *#1 0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at >>>> proxy_thread.c:72* >>>> *#2 0x0804ac02 in BaseInit () at proxy_main.c:279* >>>> *#3 0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836* >>>> >>>> Please note that the line *590 of mcxt.c maps to line 585 in the >>>> original mcxt.c* (wich is distributed in the v1.0.0 archive). >>>> You must have an instance of GTM running before starting gtm_proxy, >>>> otherwise the segfault won't occur. >>>> >>>> >>>> Plexo >>>> >>>> >>>> >>>> 2012/6/18 Koichi Suzuki <koi...@gm...> >>>> >>>>> I'm trying to fix this though it is only for 32bit. >>>>> >>>>> Because it may take a bit for me to build 32bit environment, If you >>>>> have core file of GTM crash, it's very helpful if you send be back trace of >>>>> the core (bt command) by gdb. I hope back trace will pinpoint the cause >>>>> of the bug. >>>>> >>>>> Best Regards; >>>>> ---------- >>>>> Koichi Suzuki >>>>> >>>>> >>>>> >>>>> 2012/6/17 plexo rama <ple...@gm...> >>>>> >>>>>> Suzuki-san, >>>>>> >>>>>> it seems the problem only occurs on 32bit systems. >>>>>> >>>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu >>>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on >>>>>> linode.com. >>>>>> >>>>>> The result was the same each try. I've tried it with the latest >>>>>> 1.0.0-beta package as well as v0.9.7. >>>>>> >>>>>> When starting gtm_proxy on a 32bit system in gdb I always receive >>>>>> >>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at >>>>>> mcxt.c:585 >>>>>> 585 ret = (*context->methods->alloc) (context, size); >>>>>> >>>>>> >>>>>> However, I've had success compiling & running the code / gtm_proxy on >>>>>> a 64bit system running Ubuntu 10.04.2 LTS 64bit. >>>>>> >>>>>> I hope that helps? >>>>>> >>>>>> I'm not sure whether it make sense to look further in to that issue >>>>>> as 32bit environments don't really make sense as a production system, >>>>>> except for testing. >>>>>> Although, it would be good to know, what the issue really is. >>>>>> >>>>>> >>>>>> Plexo >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2012/6/16 Koichi Suzuki <koi...@gm...> >>>>>> >>>>>>> Plexo; >>>>>>> >>>>>>> Thanks a lot for the report. I will look into it when back to >>>>>>> Japan (now I'm in Beijing). >>>>>>> >>>>>>> To reproduce the problems, could you let me know your configuration >>>>>>> (port and hosts of each component, including GTM, GTM_Proxy, Coordinator >>>>>>> and datanodes) and how to reproduce the problem? >>>>>>> >>>>>>> Also, could you send me bt of the core file? >>>>>>> >>>>>>> Best Regards; >>>>>>> ---------- >>>>>>> Koichi Suzuki >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2012/6/16 plexo rama <ple...@gm...> >>>>>>> >>>>>>>> Suzuki-san, >>>>>>>> >>>>>>>> I've received another segfault in gtm_proxy >>>>>>>> >>>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)] >>>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320 >>>>>>>> 320 EmitErrorReport(MyPort); >>>>>>>> >>>>>>>> >>>>>>>> it looks like elog.c also requires a special handling, >>>>>>>> as *MyPort-macro* uses *GetMyThreadInfo *and thus refers >>>>>>>> to GTM_ThreadInfo* instead of GTMProxy_ThreadInfo* >>>>>>>> >>>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...> >>>>>>>> >>>>>>>>> Yes, I will do it. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> Koichi Suzuki >>>>>>>>> >>>>>>>>> On 2012/06/15, at 8:07, Michael Paquier <mic...@gm...> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki < >>>>>>>>> koi...@gm...> wrote: >>>>>>>>> >>>>>>>>>> Thanks for the pointing out. Another requirement was to make >>>>>>>>>> mcxt.o >>>>>>>>>> shared among gtm and gtm_proxy. I will look check if this >>>>>>>>>> requirement makes sense now (it did, when very first pgxc_clean >>>>>>>>>> was >>>>>>>>>> implemented, which was rewritten at V 1.0). >>>>>>>>>> >>>>>>>>>> I'm afraid the cause of NULL pointer is different. >>>>>>>>>> >>>>>>>>> Suzuki-san, >>>>>>>>> >>>>>>>>> Could you sort that out with plexo and review any patch he sends? >>>>>>>>> You know this area of the code pretty well so I believe you are >>>>>>>>> well-suited here. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> -- >>>>>>>>> Michael Paquier >>>>>>>>> http://michael.otacoo.com >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-06-20 00:29:46
|
In this patch, the SQL/catalog management and the distribution mechanism use really separated APIs. So even if I do not think it is necessary to change the SQL part, the redistribution mechanism can be changed at will. For the time being, the redistribution mechanism is not really performant (well, it was not the goal of this prototype), because it uses the following model. 1) Creation of a storage table (unlogged) with default distribution (CTAS) 2) Take necessary locks on storage and redistributed table 3) Delete all data on redistributed table 4) Update catalogs with new distribution information 5) Perform INSERT SELECT from storage table to redistributed table 6) DROP storage table As mentionned by Ashutosh, the data needs to travel 4 times through network so it can really take a lot of time for tables with lots of gigs of data. Network in itself is not the bottleneck, it is the usage of the framework of postgres. This is especially true when queries used by redistribution mechanism cannot be pushed down. The worst case being when a table is redistributed to a hash/modulo on multiple nodes, as in this case it is necessary to plan each INSERT for the redistribution. As mentionned also by Ashutosh this can create a huge deal of xlogs on remote Datanodes, not really welcome after a crash recovery. There are several ways possible to improve this dramatically improve the redistribution mechanism. Here are 3 ideas. 1) Create storage table with data on a single node Here we reduce the load on network, but it cannot solve the problem of tables redistributed to modulo/hash on multiple nodes. It will create a lot of INSERT queries for a slow result. 2) Use a COPY mechanism One of the simple solutions. Instead of using a costly storage table in cluster, store the data on Coordinator during the redistribution a) COPY the data of table being redistributed to a file in $PGDATA of Coordinator. Why not $PGDATA/pg_distrib/oid? b) DELETE all the data on table being redistributed c) update catalogs d) COPY FROM file to table with new distribution type Network load is halved. COPY is also really faster. Servers of Coordinator are not chosen for there disk I/O but the folder $PGDATA/pg_distrib could be linked to a folder where a faster disk is mounted This also gets rid of the storage table. The only thing to care of is the deletion of the temporary data file once redistribution transaction commits or aborts. Data file could also be compressed to reduce space consumed and I/O on disk. 3) Use a batching process to communicate only necessary tuples from Datanodes to Coordinator. Suggested by Ashutosh, this can use COPY protocol to redistribute in a batch way the tuples being redistributed. The idea is to send from Datanodes to Coordinator only the tuples that need to be redistributed, and then let Coordinator redistribute correctly all the data depending on the new distribution. This avoids to have to store temporarily the data redistributed and all the transfer is managed by cache on Coordinator. This idea has a couple of limitations though: - a Datanode is not aware of the existence of the other nodes in cluster. Now distribution data is only available at Coordinator on catalog pgxc_class, and this distribution data contains the list of nodes where data is distributed. This is directly dependant on catalog pgxc_node. So a Datanode cannot know if a tuple will be at the correct place or not. This could be countered by allowing the run of node DDLs on Datanodes, but this adds an additional constraint on cluster setting as it forces the cluster designer to update all the pgxc_node catalogs on all the nodes. Having a pgxc_node catalog on Datanode would make sense if it communicates with other nodes through the same pooler as Coordinator, but this also raises issues with multiple backends open on one node for the same session, which is dangerous for transaction handling. - visibility concerns. What insures that a tuple has been only selected once. As redistribution is a cluster-based mechanism. What can insure that a scan on a Datanode is not taking into account some tuples that have already been redistributed. Method 1 looks useless from the point of performance. Method 2 should have a good performance. This only point is that data has to be located on Coordinator server temporarily while redistribution is being done. We could also use some GUC parameter to allow DBA to customize the way redistribution data folder is stored (compression type, file name format...). I have some concerns about method 3 as explained above. I might not take into account all the potential problems or have a limited view on this mechanism, but it introduces some new dependencies with cluster setting which may not be necessary. However any discussion on the subject is welcome. Suggestions are welcome. On Wed, Jun 20, 2012 at 8:40 AM, Michael Paquier <mic...@gm...>wrote: > > > On Wed, Jun 20, 2012 at 4:19 AM, Abbas Butt <abb...@en...>wrote: > >> You forgot to attach the patch. >> > Sorry here is the patch. > > > >> >> On Tue, Jun 19, 2012 at 10:58 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi all, >>> >>> Please find attached an improved patch. I corrected the following points: >>> - Storage table uses an access exclusive lock meaning it cannot be >>> accessed by other sessions in cluster >>> - The table redistributed uses an exclusive lock, it can be accessed by >>> the other sessions in cluster with SELECT while redistribution is running >>> - Addition of an API to manage table locking >>> - Correction of bugs regarding session concurrency. An update in >>> pgxc_class (update of distribution data) was not seen by concurrent >>> sessions in cluster. >>> - doc correction and completion >>> - regression fixes due to grammar change for node list in CTAS, CREATE >>> TABLE, EXECUTE DIRECT and CLEAN CONNECTION >>> - Fix of system functions using EXECUTE direct >>> - Fix for CTAS query generation >>> - update index of catalog pgxc_class updated >>> - Correct update for relation cache when location data is updated >>> >>> Questions are welcome. >>> This patch can be applied on master and works as expected. >>> >>> On Mon, Jun 18, 2012 at 5:25 PM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> Hi all, >>>> >>>> Based on the design above, I went to the end of my idea and took a day >>>> to write a prototype for online redistribution based on ALTER TABLE. >>>> It uses the grammar written in previous mail with ADD NODE/DELETE >>>> NODE/DISTRIBUTE BY/TO NODE | GROUP. >>>> >>>> The main idea is the use of what I call a "storage" table which is used >>>> as a temporary location for the data being distributed in cluster. >>>> This table is created as unlogged >>>> >>>> The patch sticks with the design invocated before; >>>> - Cached plans are dropped when redistribution is invocated >>>> - Vacuum is not necessary, this mechanism uses transaction-safe queries >>>> - for the time being, this implementation uses an exclusive lock, but >>>> as the redistribution is done, a ShareUpdateExclusive lock is not to >>>> exclude. >>>> - tables are reindexed if necessary. >>>> - redistribution cannot be done inside a transaction block >>>> - redistribution is not authorized with all the other commands as they >>>> are locally-safe on each node. >>>> - no restrictions on the distribution types, table types or subclusters >>>> >>>> This feature can be really improved for example in the case of >>>> replicated tables in particular, when the list of nodes of the table is >>>> changed. >>>> It is one of the things I would like to improve as it would really >>>> increase performance >>>> >>>> Regards, >>>> >>>> -- >>>> Michael Paquier >>>> http://michael.otacoo.com >>>> >>> >>> >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > Michael Paquier > http://michael.otacoo.com > -- Michael Paquier http://michael.otacoo.com |