You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
1
(11) |
2
(7) |
3
(6) |
4
|
5
|
6
(1) |
7
(2) |
8
(2) |
9
(6) |
10
(2) |
11
|
12
(1) |
13
(6) |
14
(5) |
15
(4) |
16
(5) |
17
(4) |
18
|
19
|
20
(1) |
21
(3) |
22
(2) |
23
(1) |
24
|
25
|
26
|
27
(5) |
28
|
29
|
30
|
31
|
|
From: Koichi S. <koi...@gm...> - 2010-12-27 07:44:36
|
Hi all; Sorry, I made a stupid mistake to include a portion of error message from mailman. Please use this message to respond. I'm now considering to change Postgres-XC license from LGPL to BSD (and that of PostgreSQL) so that XC's work can be easily migrated into PostgreSQL core as well. I'd like to have an input, especially those who contributed the code, to proceed this change. Thank you; ---------- Koichi Suzuki |
From: Devrim G. <de...@gu...> - 2010-12-27 07:38:28
|
Hi, On Mon, 2010-12-27 at 16:35 +0900, Koichi Suzuki wrote: > I'm now considering to change Postgres-XC license from LGPL to BSD > (and that of PostgreSQL) so that XC's work can be easily migrated into > PostgreSQL core as well. I'd like to have an input, especially those > who contributed the code, to proceed this change. That's excellent! > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows > customers to consolidate database storage, standardize their database > environment, and, should the need arise, upgrade to a full multi-node > Oracle RAC database without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl Heh :) -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer PostgreSQL RPM Repository: http://yum.pgrpms.org Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz |
From: Koichi S. <koi...@gm...> - 2010-12-27 07:36:02
|
Hi all; I'm now considering to change Postgres-XC license from LGPL to BSD (and that of PostgreSQL) so that XC's work can be easily migrated into PostgreSQL core as well. I'd like to have an input, especially those who contributed the code, to proceed this change. Thank you very much in advance. ---------- Koichi Suzuki |
From: xiong w. <wan...@gm...> - 2010-12-27 06:42:49
|
Dears, The enclosure is a patch for bug#3141640:select result is error when the table has no column. Regards, Benny |
From: xiong w. <wan...@gm...> - 2010-12-27 06:37:52
|
Dears, The enclosure is the patch for bug#3142311:renaming sequences error. Regards, Benny. |
From: xiong w. <wan...@gm...> - 2010-12-23 06:51:27
|
Hi, About the sequence, we tried to realize the executing by change the statement. For example, create table t(a int, b int default nextval('seq1')); If we insert a tuple like insert into t values(1), we replace the statement with insert into t values(1,2) (here, we assume nextval('seq1') is 2). But we met a problem that we couldn't process when we add a column with default nextval('seq1') into a table which has multiple tuples. Because the column needs increase automatically, we couldn't use a proper sequence value to replace the default value. For example: create table t1(a int); insert into t1 values(1),(1),(1); select * from t1; TEST=# select * from t1; A --- 1 1 1 (3 行) alter table t1 add column b int default nextval('seq1'); select * from t1; TEST=# select * from t1; A | B ---+---- 1 | 1 1 | 2 1 | 3 (3 行) Regards, Benny 在 2010年12月9日 上午9:15,Koichi Suzuki <ko...@in...> 写道: > This should be put into the track. > > I discussed with Michael and found that the issue is not that simple > because we should consider the case of the replicated table. It is not > correct to get sequence value directly from GTM to Datanode in this > case. Coordinator should handle this. > > Regards; > --- > Koichi > > (2010年12月09日 10:11), xiong wang wrote: >> Hi Koichi, >> >> Yes, I consider sequence should be created on datanodes but not only >> on coordinators. But all the sequence value should be from GTM. >> >> Regards, >> Benny >> >> 2010/12/9 Koichi Suzuki<ko...@in...>: >>> In the current implementation, sequence value is supplied by GTM, as you >>> know. It is assumed that this value is supplied to the datanode through >>> the coordinator. In the case of your case, default value must be handled >>> by the datanode and the datanode has to inquire GTM for the nextval of the >>> sequence. >>> >>> I'm afraid this is missing in the current code. >>> --- >>> Koichi >>> >>> (2010年12月08日 19:33), xiong wang wrote: >>>> >>>> Dears, >>>> >>>> steps: >>>> postgres=# create sequence seq start with 1; >>>> CREATE SEQUENCE >>>> postgres=# create table t(a int default nextval('seq'), b int); >>>> ERROR: Could not commit (or autocommit) data node connection >>>> >>>> datanode log as follows: >>>> LOG: statement: create table t(a int default nextval('seq'), b int); >>>> ERROR: relation "seq" does not exist >>>> >>>> When I checked the source code, I found sequence can't be created on >>>> datanodes. Could you explain why? >>>> >>>> Regards, >>>> Benny >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> What happens now with your Lotus Notes apps - do you make another costly >>>> upgrade, or settle for being marooned without product support? Time to >>>> move >>>> off Lotus Notes and onto the cloud with Force.com, apps are easier to >>>> build, >>>> use, and manage than apps on traditional platforms. Sign up for the Lotus >>>> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>> >>> >> > > |
From: Michael P. <mic...@gm...> - 2010-12-22 02:12:43
|
On Wed, Dec 22, 2010 at 10:23 AM, Mason Sharp <mas...@en...>wrote: > After "make clean; make", things look better. > > Thanks to take the time to check that. > I found another issue though. Still, you can go ahead and commit this since > it is close, in order to make merging easier. > > I'll do it, thanks. > If the coordinator tries to commit the prepared transactions, if it sends > commit prepared to one of the nodes, then is killed before it can send to > the other, if I restart the coordinator, I see the data from one of the > nodes only (GTM closed the transcation), which is not atomic. The second > data node is still alive and was the entire time. > That is true, if coordinator crashes, GTM closes all the backends of transactions that it considers as open. In the case of implicit COMMIT, even if we prepare/commit on the nodes, it is still seen as open on GTM. > > I fear we may have to treat implicit transactions similar to explicit > transactions. (BTW, do we handle explicit properly for these similar cases, > too?) If we stick with performance short cuts it is hard to be reliably > atomic. (Again, I will take the blame for trying to speed things up. > Perhaps we can have it as a configuration option if people have a lot of > implicit 2PC going on and understand the risks.) > Yeah I think so. A GUC parameter would make the deal, but I'd like to discuss more about that before deciding anything. > Anyway, the transaction would remain open, but it would have to be resolved > somehow. > > If we had a "transaction clean up" thread in GTM, it could note the > transaction information and periodically try and connect to the registered > nodes and resolve according to the rules we have talked about. (Again, some > of this code could be in some of the recovery tools you are writing, too). > The nice thing about doing something like this is we can automate things as > much as possible and not require DBA intervention; if a non-GTM component > goes down and comes up again, things will resolve by themselves. I suppose > if it is GTM itself that went down, once it rebuilds state properly, this > same mechanism could be called at the end of GTM recovery and resolve the > outstanding issues. > That it more or less what we are planning to do with the utility that will have to check the remaining 2PC transaction after a Coordinator crash. This utility would be kicked by the monitoring agent when noticing a Coordinator crash. This feature needs two things: 1) fix for EXECUTE DIRECT 2) extension of 2PC table (patch already written but not realigned with latest 2PC code) I think we need to walk through every step in the commit sequence and kill > an involved process and verify that we have a consistent view of the > database afterward, and that we have the ability/tools to resolve it. > > This code requires careful testing. > That's true, this code could lead easily to unexpected issues by playing with 2PC. -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-22 01:23:35
|
On 12/21/10 3:33 AM, Michael Paquier wrote: > > > Could you give me more details about this crash? > After "make clean; make", things look better. I found another issue though. Still, you can go ahead and commit this since it is close, in order to make merging easier. If the coordinator tries to commit the prepared transactions, if it sends commit prepared to one of the nodes, then is killed before it can send to the other, if I restart the coordinator, I see the data from one of the nodes only (GTM closed the transcation), which is not atomic. The second data node is still alive and was the entire time. I fear we may have to treat implicit transactions similar to explicit transactions. (BTW, do we handle explicit properly for these similar cases, too?) If we stick with performance short cuts it is hard to be reliably atomic. (Again, I will take the blame for trying to speed things up. Perhaps we can have it as a configuration option if people have a lot of implicit 2PC going on and understand the risks.) Anyway, the transaction would remain open, but it would have to be resolved somehow. If we had a "transaction clean up" thread in GTM, it could note the transaction information and periodically try and connect to the registered nodes and resolve according to the rules we have talked about. (Again, some of this code could be in some of the recovery tools you are writing, too). The nice thing about doing something like this is we can automate things as much as possible and not require DBA intervention; if a non-GTM component goes down and comes up again, things will resolve by themselves. I suppose if it is GTM itself that went down, once it rebuilds state properly, this same mechanism could be called at the end of GTM recovery and resolve the outstanding issues. I think we need to walk through every step in the commit sequence and kill an involved process and verify that we have a consistent view of the database afterward, and that we have the ability/tools to resolve it. This code requires careful testing. Thanks, Mason > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-21 08:34:05
|
Sorry for my late reply, please see my answers inline. > #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, > pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at > execRemote.c:1826 > 1826 int co_conn_count = pgxc_handles->co_conn_count; > (gdb) bt > #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, > pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at > execRemote.c:1826 > #1 0x001c2b0d in PGXCNodeImplicitCommitPrepared (prepare_xid=924, > commit_xid=925, gid=0xbfffef4f "T924", is_commit=1 '\001') at > execRemote.c:1775 > #2 0x0005845f in CommitTransaction () at xact.c:2013 > #3 0x0005948f in CommitTransactionCommand () at xact.c:2746 > #4 0x0029a6d7 in finish_xact_command () at postgres.c:2437 > #5 0x002980d2 in exec_simple_query (query_string=0x103481c "commit;") at > postgres.c:1070 > #6 0x0029ccbb in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 > "masonsharp") at postgres.c:3766 > #7 0x0025848c in BackendRun (port=0x7016f0) at postmaster.c:3607 > #8 0x002577f3 in BackendStartup (port=0x7016f0) at postmaster.c:3216 > #9 0x00254225 in ServerLoop () at postmaster.c:1445 > #10 0x00253831 in PostmasterMain (argc=5, argv=0x7005a0) at > postmaster.c:1098 > #11 0x001cf261 in main (argc=5, argv=0x7005a0) at main.c:188 > > pgxc_handles looks ok though. It works ok in your environment? > It looks that it crashed when assigning the coordinator number from pgxc_handles. I made a couple of tests in my environment and it worked well, with assertions assigned. By a couple of tests, I made a sequence creation, a couple of inserts in single and multiple nodes, DDL run. Everything went fine. We already saw in the past that not all the problems are reproducible in the environments we use for tests. Could you give me more details about this crash? -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: xiong w. <wan...@gm...> - 2010-12-21 03:41:27
|
Dears, Still relative with patch for bug#3126459. postgres=# select avg(q1) from test group by q1; avg ----- 0.0 0.0 (2 rows) It's confused that avg always returns 0. Regards, Benny 在 2010年12月21日 上午10:57,xiong wang <wan...@gm...> 写道: > Dears, > > After I apply my patch on bug#3126459, another aggregate bug occured. > The following are some basic infomation: > steps: > create table bb(a int , b int ); ^ > insert into bb values(1,2); > insert into bb values(1,3); > insert into bb values(4,3); > insert into bb values(4,5); > select sum(sum(a)) over (partition by a),count(a) from bb group by a,b > order by a,b; > > core dump: > > Core was generated by `postgres: shench postgres [local] SELECT > '. > Program terminated with signal 11, Segmentation fault. > [New process 29290] > #0 pg_detoast_datum (datum=0x0) at fmgr.c:2217 > 2217 if (VARATT_IS_EXTENDED(datum)) > (gdb) bt > #0 pg_detoast_datum (datum=0x0) at fmgr.c:2217 > #1 0x0000000000451188 in printtup (slot=0x14e80538, self=0x14e6d3c0) > at printtup.c:342 > #2 0x00000000005370f8 in ExecutePlan (estate=0x14e80320, > planstate=0x14e80640, operation=CMD_SELECT, numberTuples=0, > direction=<value optimized out>, dest=0x14e6d3c0) at execMain.c:1774 > #3 0x000000000053763c in standard_ExecutorRun (queryDesc=0x14e2e520, > direction=ForwardScanDirection, count=0) > at execMain.c:312 > #4 0x00000000005ecb24 in PortalRunSelect (portal=0x14e7c300, > forward=<value optimized out>, count=0, dest=0x14e6d3c0) > at pquery.c:967 > #5 0x00000000005edd40 in PortalRun (portal=0x14e7c300, > count=9223372036854775807, isTopLevel=1 '\001', dest=0x14e6d3c0, > altdest=0x14e6d3c0, completionTag=0x7fffc47a7710 "") at pquery.c:793 > #6 0x00000000005e91a1 in exec_simple_query ( > query_string=0x14e18880 "select sum(sum(a)) over (partition by > a),count(a) from bb group by a,b order by a,b;") > at postgres.c:1053 > #7 0x00000000005ea7b6 in PostgresMain (argc=4, argv=<value optimized > out>, username=0x14d6e290 "shench") at postgres.c:3766 > #8 0x00000000005c07cc in ServerLoop () at postmaster.c:3607 > #9 0x00000000005c2a1c in PostmasterMain (argc=9, argv=0x14d6b730) at > postmaster.c:1098 > #10 0x000000000056d5ae in main (argc=9, argv=<value optimized out>) at > main.c:188 > (gdb) > > Regads, > Benny > > > 在 2010年12月21日 上午10:43,xiong wang <wan...@gm...> 写道: >> Deras, >> >> A bug that Mason told is caused by alias processed actually as Mason >> suggested. It's has nothing to with the bug#3126459:select error : >> (group by .. >> order by.. ) becuase the statement like select count(t2.*) from test >> t1 left join test t2 on (t1.q2 = t2.q1) has the same problem. >> >> The problem is caused by the line 2239 in create_remotequery_plan >> function that deparse_context = >> deparse_context_for_remotequery(get_rel_name(rte->relid), rte->relid). >> >> If the first paramter of function deparse_context_for_remotequery >> get_rel_name(rte->relid) is changed into rte->eref->aliasname. The >> problem that Mason mentioned will be resolved. >> >> But when I excuted the statement after fixed the bug. It introduced a >> segment fault. Here is some basic infomation during gdb: >> $28 = {type = T_TupleTableSlot, tts_isempty = 0 '\0', tts_shouldFree >> = 0 '\0', tts_shouldFreeMin = 0 '\0', >> tts_slow = 0 '\0', tts_tuple = 0x0, tts_dataRow = 0x0, tts_dataLen = >> -1, tts_dataNode = 0, >> tts_shouldFreeRow = 0 '\0', tts_attinmeta = 0x0, tts_tupleDescriptor >> = 0xc797060, tts_mcxt = 0xc782db0, >> tts_buffer = 0, tts_nvalid = 2, tts_values = 0xc797270, tts_isnull = >> 0xc797290 "", tts_mintuple = 0x0, >> tts_minhdr = {t_len = 0, t_self = {ip_blkid = {bi_hi = 0, bi_lo = >> 0}, ip_posid = 0}, t_tableOid = 0, >> t_data = 0x0}, tts_off = 0} >> >> Generally, tts_dataRow should have a value as I think. But as you can >> see above, tts_dataRow is null. Therefore, it results in Postgres-XC >> cann't deform tts_dataRow into datum arrays. I don't know whether I am >> right. I don't know why such a problem occurred. I hope you could give >> me some advice. Only the count(t2.*) will result in such a problem. >> Other aggregates function such as count(t2.q1) or sum(t2.q1) will not >> cause the problem. >> >> Btw, the following is the core dump infomation: >> (gdb) bt >> #0 0x0000000000450ec9 in heap_form_minimal_tuple >> (tupleDescriptor=0x8b061d8, values=0x8b063e8, isnull=0x8b06408 "") >> at heaptuple.c:1565 >> #1 0x0000000000598cae in ExecCopySlotMinimalTuple (slot=0x8b040a8) at >> execTuples.c:790 >> #2 0x00000000007a4e22 in tuplestore_puttupleslot (state=0x8b13d90, >> slot=0x8b040a8) at tuplestore.c:546 >> #3 0x00000000005a5b5a in ExecMaterial (node=0x8b05930) at nodeMaterial.c:109 >> #4 0x000000000058d563 in ExecProcNode (node=0x8b05930) at execProcnode.c:428 >> #5 0x00000000005a7bb0 in ExecNestLoop (node=0x8b049f0) at nodeNestloop.c:154 >> #6 0x000000000058d52d in ExecProcNode (node=0x8b049f0) at execProcnode.c:413 >> #7 0x000000000059e28d in agg_fill_hash_table (aggstate=0x8b04430) at >> nodeAgg.c:1054 >> #8 0x000000000059de85 in ExecAgg (node=0x8b04430) at nodeAgg.c:833 >> #9 0x000000000058d599 in ExecProcNode (node=0x8b04430) at execProcnode.c:440 >> #10 0x000000000058ac36 in ExecutePlan (estate=0x8b03b10, >> planstate=0x8b04430, operation=CMD_SELECT, numberTuples=0, >> direction=ForwardScanDirection, dest=0x8af1de0) at execMain.c:1520 >> #11 0x0000000000588edc in standard_ExecutorRun (queryDesc=0x8a8bcc0, >> direction=ForwardScanDirection, count=0) >> at execMain.c:312 >> #12 0x0000000000588de5 in ExecutorRun (queryDesc=0x8a8bcc0, >> direction=ForwardScanDirection, count=0) >> at execMain.c:261 >> #13 0x000000000068f7a5 in PortalRunSelect (portal=0x8b01b00, forward=1 >> '\001', count=0, dest=0x8af1de0) >> at pquery.c:967 >> #14 0x000000000068f448 in PortalRun (portal=0x8b01b00, >> count=9223372036854775807, isTopLevel=1 '\001', >> dest=0x8af1de0, altdest=0x8af1de0, completionTag=0x7fff49c60db0 >> "") at pquery.c:793 >> #15 0x000000000068983a in exec_simple_query ( >> query_string=0x8a75f40 "select count(t2.*), t2.q1 from test t1 >> left join test t2 on (t1.q2 = t2.q1) group by t2.q1;") at >> postgres.c:1053 >> #16 0x000000000068d7a8 in PostgresMain (argc=4, argv=0x89cb560, >> username=0x89cb520 "postgres") at postgres.c:3766 >> #17 0x000000000065619e in BackendRun (port=0x89ecbf0) at postmaster.c:3607 >> ---Type <return> to continue, or q <return> to quit--- >> #18 0x00000000006556fb in BackendStartup (port=0x89ecbf0) at postmaster.c:3216 >> #19 0x0000000000652ac6 in ServerLoop () at postmaster.c:1445 >> #20 0x000000000065226c in PostmasterMain (argc=9, argv=0x89c8910) at >> postmaster.c:1098 >> #21 0x00000000005d9bcf in main (argc=9, argv=0x89c8910) at main.c:188 >> >> Looking forward your reply. >> >> Regards, >> Benny >> >> >> >> >> 在 2010年12月17日 下午11:20,Mason Sharp <mas...@en...> 写道: >>> On 12/16/10 9:00 PM, xiong wang wrote: >>>> Hi Mason, >>>> >>>> I also found some other errors after I submit the patch, which is >>>> relative with such a bug. I will fix the problems your mentioned and >>>> we found. >>> >>> OK. If it involves multiple remote queries (or join reduction) and looks >>> difficult, it might make more sense to let us know. I think Pavan is >>> very familiar with that code and might be able to fix it quickly. >>> >>> Mason >>> >>> >>>> Regards, >>>> Benny >>>> >>>> 在 2010年12月17日 上午3:05,Mason Sharp <mas...@en...> 写道: >>>>> >>>>> ---------- 已转发邮件 ---------- >>>>> 发件人: xiong wang <wan...@gm...> >>>>> 日期: 2010年12月15日 上午11:02 >>>>> 主题: patch for bug#3126459:select error : (group by .. order by.. ) >>>>> 收件人: pos...@li... >>>>> Dears, >>>>> The enclosure is the patch for bug#3126459:select error : (group by .. >>>>> order by.. ). >>>>> Your advice will be appreiciated. >>>>> Btw, I modified an error in my view that the variable standardPlan is >>>>> always a free pointer. >>>>> Regards, >>>>> Benny >>>>> >>>>> Thanks, Benny. >>>>> >>>>> You definitely are addressing a bug that got introduced at some point, but >>>>> now I get a different error for the case in question: >>>>> >>>>> mds=# select t1.q2, >>>>> count(t2.*) >>>>> from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = >>>>> t2.q1) >>>>> group by t1.q2 order by 1; >>>>> ERROR: invalid reference to FROM-clause entry for table "int8_tbl" >>>>> >>>>> That is probably due to general RemoteQuery handling and aliasing. >>>>> >>>>> Anyway, I can imagine that your fix also addresses other reported issues. >>>>> >>>>> Thanks, >>>>> >>>>> Mason >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Lotusphere 2011 >>>>> Register now for Lotusphere 2011 and learn how >>>>> to connect the dots, take your collaborative environment >>>>> to the next level, and enter the era of Social Business. >>>>> http://p.sf.net/sfu/lotusphere-d2d >>>>> >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>>> >>>>> -- >>>>> Mason Sharp >>>>> EnterpriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>> >>> >>> -- >>> Mason Sharp >>> EnterpriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >>> >> > |
From: xiong w. <wan...@gm...> - 2010-12-21 02:43:38
|
Deras, A bug that Mason told is caused by alias processed actually as Mason suggested. It's has nothing to with the bug#3126459:select error : (group by .. order by.. ) becuase the statement like select count(t2.*) from test t1 left join test t2 on (t1.q2 = t2.q1) has the same problem. The problem is caused by the line 2239 in create_remotequery_plan function that deparse_context = deparse_context_for_remotequery(get_rel_name(rte->relid), rte->relid). If the first paramter of function deparse_context_for_remotequery get_rel_name(rte->relid) is changed into rte->eref->aliasname. The problem that Mason mentioned will be resolved. But when I excuted the statement after fixed the bug. It introduced a segment fault. Here is some basic infomation during gdb: $28 = {type = T_TupleTableSlot, tts_isempty = 0 '\0', tts_shouldFree = 0 '\0', tts_shouldFreeMin = 0 '\0', tts_slow = 0 '\0', tts_tuple = 0x0, tts_dataRow = 0x0, tts_dataLen = -1, tts_dataNode = 0, tts_shouldFreeRow = 0 '\0', tts_attinmeta = 0x0, tts_tupleDescriptor = 0xc797060, tts_mcxt = 0xc782db0, tts_buffer = 0, tts_nvalid = 2, tts_values = 0xc797270, tts_isnull = 0xc797290 "", tts_mintuple = 0x0, tts_minhdr = {t_len = 0, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 0} Generally, tts_dataRow should have a value as I think. But as you can see above, tts_dataRow is null. Therefore, it results in Postgres-XC cann't deform tts_dataRow into datum arrays. I don't know whether I am right. I don't know why such a problem occurred. I hope you could give me some advice. Only the count(t2.*) will result in such a problem. Other aggregates function such as count(t2.q1) or sum(t2.q1) will not cause the problem. Btw, the following is the core dump infomation: (gdb) bt #0 0x0000000000450ec9 in heap_form_minimal_tuple (tupleDescriptor=0x8b061d8, values=0x8b063e8, isnull=0x8b06408 "") at heaptuple.c:1565 #1 0x0000000000598cae in ExecCopySlotMinimalTuple (slot=0x8b040a8) at execTuples.c:790 #2 0x00000000007a4e22 in tuplestore_puttupleslot (state=0x8b13d90, slot=0x8b040a8) at tuplestore.c:546 #3 0x00000000005a5b5a in ExecMaterial (node=0x8b05930) at nodeMaterial.c:109 #4 0x000000000058d563 in ExecProcNode (node=0x8b05930) at execProcnode.c:428 #5 0x00000000005a7bb0 in ExecNestLoop (node=0x8b049f0) at nodeNestloop.c:154 #6 0x000000000058d52d in ExecProcNode (node=0x8b049f0) at execProcnode.c:413 #7 0x000000000059e28d in agg_fill_hash_table (aggstate=0x8b04430) at nodeAgg.c:1054 #8 0x000000000059de85 in ExecAgg (node=0x8b04430) at nodeAgg.c:833 #9 0x000000000058d599 in ExecProcNode (node=0x8b04430) at execProcnode.c:440 #10 0x000000000058ac36 in ExecutePlan (estate=0x8b03b10, planstate=0x8b04430, operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection, dest=0x8af1de0) at execMain.c:1520 #11 0x0000000000588edc in standard_ExecutorRun (queryDesc=0x8a8bcc0, direction=ForwardScanDirection, count=0) at execMain.c:312 #12 0x0000000000588de5 in ExecutorRun (queryDesc=0x8a8bcc0, direction=ForwardScanDirection, count=0) at execMain.c:261 #13 0x000000000068f7a5 in PortalRunSelect (portal=0x8b01b00, forward=1 '\001', count=0, dest=0x8af1de0) at pquery.c:967 #14 0x000000000068f448 in PortalRun (portal=0x8b01b00, count=9223372036854775807, isTopLevel=1 '\001', dest=0x8af1de0, altdest=0x8af1de0, completionTag=0x7fff49c60db0 "") at pquery.c:793 #15 0x000000000068983a in exec_simple_query ( query_string=0x8a75f40 "select count(t2.*), t2.q1 from test t1 left join test t2 on (t1.q2 = t2.q1) group by t2.q1;") at postgres.c:1053 #16 0x000000000068d7a8 in PostgresMain (argc=4, argv=0x89cb560, username=0x89cb520 "postgres") at postgres.c:3766 #17 0x000000000065619e in BackendRun (port=0x89ecbf0) at postmaster.c:3607 ---Type <return> to continue, or q <return> to quit--- #18 0x00000000006556fb in BackendStartup (port=0x89ecbf0) at postmaster.c:3216 #19 0x0000000000652ac6 in ServerLoop () at postmaster.c:1445 #20 0x000000000065226c in PostmasterMain (argc=9, argv=0x89c8910) at postmaster.c:1098 #21 0x00000000005d9bcf in main (argc=9, argv=0x89c8910) at main.c:188 Looking forward your reply. Regards, Benny 在 2010年12月17日 下午11:20,Mason Sharp <mas...@en...> 写道: > On 12/16/10 9:00 PM, xiong wang wrote: >> Hi Mason, >> >> I also found some other errors after I submit the patch, which is >> relative with such a bug. I will fix the problems your mentioned and >> we found. > > OK. If it involves multiple remote queries (or join reduction) and looks > difficult, it might make more sense to let us know. I think Pavan is > very familiar with that code and might be able to fix it quickly. > > Mason > > >> Regards, >> Benny >> >> 在 2010年12月17日 上午3:05,Mason Sharp <mas...@en...> 写道: >>> >>> ---------- 已转发邮件 ---------- >>> 发件人: xiong wang <wan...@gm...> >>> 日期: 2010年12月15日 上午11:02 >>> 主题: patch for bug#3126459:select error : (group by .. order by.. ) >>> 收件人: pos...@li... >>> Dears, >>> The enclosure is the patch for bug#3126459:select error : (group by .. >>> order by.. ). >>> Your advice will be appreiciated. >>> Btw, I modified an error in my view that the variable standardPlan is >>> always a free pointer. >>> Regards, >>> Benny >>> >>> Thanks, Benny. >>> >>> You definitely are addressing a bug that got introduced at some point, but >>> now I get a different error for the case in question: >>> >>> mds=# select t1.q2, >>> count(t2.*) >>> from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = >>> t2.q1) >>> group by t1.q2 order by 1; >>> ERROR: invalid reference to FROM-clause entry for table "int8_tbl" >>> >>> That is probably due to general RemoteQuery handling and aliasing. >>> >>> Anyway, I can imagine that your fix also addresses other reported issues. >>> >>> Thanks, >>> >>> Mason >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Lotusphere 2011 >>> Register now for Lotusphere 2011 and learn how >>> to connect the dots, take your collaborative environment >>> to the next level, and enter the era of Social Business. >>> http://p.sf.net/sfu/lotusphere-d2d >>> >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >>> -- >>> Mason Sharp >>> EnterpriseDB Corporation >>> The Enterprise Postgres Company >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> > > > -- > Mason Sharp > EnterpriseDB Corporation > The Enterprise Postgres Company > > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > |
From: Mason S. <mas...@en...> - 2010-12-20 15:35:17
|
On 12/14/10 9:37 PM, Michael Paquier wrote: > > Just took a brief look so far. Seems better. > > I understand that recovery and HA is in development and things are > being done to lay the groundwork and improve, and that with this > patch we are not trying to yet handle any and every situation. > What happens if the coordinator fails before it can update GTM though? > > In this case the information is not saved on GTM. > For a Coordinator crash, I was thinking of an external utility > associated with the monitoring agent in charge of analyzing prepared > transactions of the crashed Coordinator. > This utility would analyze in the cluster the prepared transaction of > the crashed Coordinator, and decide automatically which one to abort, > commit depending on the transaction situation. > > For this purpose, it is essential to extend the 2PC information sent > to Nodes (Datanodes of course, but Coordinators included in case of DDL). > The patch extending 2PC information on nodes is also on this thread > (patch based on version 6 of implicit 2pc patch). > In this case I believe it is not necessary to save any info on GTM as > the extended 2PC information only would be necessary to analyze the > 2PC transaction of the crashed Coordinator. > > > Also, I did a test and got this: > > > WARNING: unexpected EOF on datanode connection > WARNING: Connection to Datanode 1 has unexpected state 1 and will > be dropped > > ERROR: Could not commit prepared transaction implicitely > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > #0 0x907afe42 in kill$UNIX2003 () > #1 0x9082223a in raise () > #2 0x9082e679 in abort () > #3 0x003917ce in ExceptionalCondition (conditionName=0x433f6c > "!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 > "FailedAssertion", fileName=0x433f50 "procarray.c", > lineNumber=283) at assert.c:57 > #4 0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, > latestXid=1018) at procarray.c:283 > #5 0x0005905c in AbortTransaction () at xact.c:2525 > #6 0x00059a6e in AbortCurrentTransaction () at xact.c:3001 > #7 0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094 > #8 0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, > username=0x1002fc8 "masonsharp") at postgres.c:3622 > #9 0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607 > #10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216 > #11 0x002542b5 in ServerLoop () at postmaster.c:1445 > #12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at > postmaster.c:1098 > #13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188 > > I suppose you enabled assertions when doing this test. > The Coordinator was complaining that its transaction ID in PGProc was > not correct. > It is indeed true as in the case tested the transaction has ever > committed on Coordinator. I tried out the latest patch and it still crashes the coordinator. #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at execRemote.c:1826 1826 int co_conn_count = pgxc_handles->co_conn_count; (gdb) bt #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at execRemote.c:1826 #1 0x001c2b0d in PGXCNodeImplicitCommitPrepared (prepare_xid=924, commit_xid=925, gid=0xbfffef4f "T924", is_commit=1 '\001') at execRemote.c:1775 #2 0x0005845f in CommitTransaction () at xact.c:2013 #3 0x0005948f in CommitTransactionCommand () at xact.c:2746 #4 0x0029a6d7 in finish_xact_command () at postgres.c:2437 #5 0x002980d2 in exec_simple_query (query_string=0x103481c "commit;") at postgres.c:1070 #6 0x0029ccbb in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 "masonsharp") at postgres.c:3766 #7 0x0025848c in BackendRun (port=0x7016f0) at postmaster.c:3607 #8 0x002577f3 in BackendStartup (port=0x7016f0) at postmaster.c:3216 #9 0x00254225 in ServerLoop () at postmaster.c:1445 #10 0x00253831 in PostmasterMain (argc=5, argv=0x7005a0) at postmaster.c:1098 #11 0x001cf261 in main (argc=5, argv=0x7005a0) at main.c:188 pgxc_handles looks ok though. It works ok in your environment? > > > I did the same test as before. I killed a data node after it > received a COMMIT PREPARED message. > > I think we should be able to continue. > > The good news is that I should not see partially committed data, > which I do not. > > But if I try and manually commit it from a new connection to the > coordinator: > > mds=# COMMIT PREPARED 'T1018'; > ERROR: Could not get GID data from GTM > > Maybe GTM removed this info when the coordinator disconnected? (Or > maybe implicit transactions are only associated with a certain > connection?) > > Yes it has been removed when your Coordinator instance crashed. > > I can see the transaction on one data node, but not the other. > > Ideally we would come up with a scheme where if the coordinator > session does not notify GTM, we can somehow recover. Maybe this > is my fault- I believe I advocated avoiding the extra work for > implicit 2PC in the name of performance. :-) > > We can think about what to do in the short term, and how to handle > in the long term. > > In the short term, your approach may be good enough once debugged, > since it is a relatively rare case. > > Long term we could think about a thread that runs on GTM and wakes > up every 30 or 60 seconds or so (configurable), collects implicit > transactions from the nodes (extension to pg_prepared_xacts > required?) and if it sees that the XID does not have an associated > live connection, knows that something went awry. It then sees if > it committed on any of the nodes. If not, rollback all, if it did > on at least one, commit on all. If one of the data nodes is down, > it won't do anything, perhaps log a warning. This would avoid user > intervention, and would be pretty cool. Some of this code you may > already have been working on for recovery and we could reuse here. > > This is a nice idea. > It depends of course on one thing; if we decide to base the HA > features on a monitoring agent only or if XC should be able to run on > its own (or even allow both modes). We can think about it... It could be separate from GTM, part of a monitoring process. Mason > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Mason S. <mas...@en...> - 2010-12-17 15:25:11
|
On 12/16/10 7:18 PM, Koichi Suzuki wrote: > Hmm... I thought it will be reasonable enough just to allow SELECT > (and COMMIT/ABORT) statement in EXECUTE DIRECT semantics. Also, > because we've changed the infrastructure of aggregate functions, I > agree it will not safe enough to run such functions just in the > coordinator. We need an infrastructure as Benny pointed out: > > SELECT count(*) from A, A; > > Because EXECUTE DIRECT is just for housekeeping usage, I think it will > also be reasonable to put some restriction which is sufficient for the > dedicated use. > > In this case, because 2PC recovery does not need aggregate, I think we > can have this as is. > Aggregate was an example. I am not sure, there may be other unexpected side effects, I just stumbled upon this when I started testing. I think we should really keep this simple and simply pass down the statement as is down to the nodes. That is intuitive. The results I see are kind of weird. It is not simply passing down the statement but somehow trying to parallelize it. I don't think that is what we want, and I am worried about unexpected results for other statements. I really think we should change this. Thanks, Mason > Regards; > --- > Koichi > > (2010年12月17日 07:09), Mason Sharp wrote: >> On 12/16/10 1:51 AM, Michael Paquier wrote: >>> Hi all, >>> >>> I extended the patch so as to be able to launch utilities on targeted >>> nodes (datanodes and Coordinators). >>> EXECUTE DIRECT is still restricted for UPDATE and DELETE. >>> And it is still not possible to launch a query on the local >>> Coordinator without spreading it to the other nodes. >>> >>> With this patch, in the case of a 2PC transaction that is partially >>> committed or partially aborted in the cluster, >>> EXECUTE DIRECT can be used to target specific nodes where to send a >>> COMMIT PREPARED or ABORT PREPARED. >>> >>> This is definitely useful for HA features and recovery also. >>> >> >> Michael, >> >> in pgxc_planner(), is that block of code for only when executing on a >> local coordinator? Could it be safely handled above the switch() >> statement? I mean, if it is EXECUTE DIRECT, we just want to pass down >> the SQL string and have it executed as is. >> >> I ran some brief tests. >> >> DBT1=# EXECUTE DIRECT on NODE 1 'select count(*) from orders'; >> count >> ------- >> 1269 >> (1 row) >> >> DBT1=# EXECUTE DIRECT on NODE 2 'select count(*) from orders'; >> count >> ------- >> 1332 >> (1 row) >> >> DBT1=# EXECUTE DIRECT on NODE 1,2 'select count(*) from orders'; >> count >> ------- >> 2601 >> (1 row) >> >> >> For this last one, I expected to see two rows. That is, it passes down >> the exact SQL string, then shows the results of each. It looks like it >> is hooking into our general planning. We don't want the aggregate >> managed on the coordinator (hmmm, although it may open up interesting >> ideas in the future...). >> >> Similarly, something is not quite right with group by: >> >> DBT1=# EXECUTE DIRECT on NODE 1,2 'select o_status, count(*) from orders >> group by o_status'; >> ERROR: unrecognized node type: 656 >> >> >> DBT1=# EXECUTE DIRECT on NODE 2 'select o_status, count(*) from orders >> group by o_status'; >> o_status | count >> ----------+------- >> | 1332 >> (1 row) >> >> Here, too, I think we should just get the results as if 'select >> o_status, count(*) from orders group by o_status' was executed on each >> node, all thrown together in the results (long term we could add an >> optional NODE column, like GridSQL). >> >> Perhaps this helps simplify things a bit. >> >> Thanks, >> >> Mason >> >> >>> Thanks, >>> >>> -- >>> Michael Paquier >>> http://michaelpq.users.sourceforge.net >>> >> >> > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Mason S. <mas...@en...> - 2010-12-17 15:21:12
|
On 12/16/10 9:00 PM, xiong wang wrote: > Hi Mason, > > I also found some other errors after I submit the patch, which is > relative with such a bug. I will fix the problems your mentioned and > we found. OK. If it involves multiple remote queries (or join reduction) and looks difficult, it might make more sense to let us know. I think Pavan is very familiar with that code and might be able to fix it quickly. Mason > Regards, > Benny > > 在 2010年12月17日 上午3:05,Mason Sharp <mas...@en...> 写道: >> >> ---------- 已转发邮件 ---------- >> 发件人: xiong wang <wan...@gm...> >> 日期: 2010年12月15日 上午11:02 >> 主题: patch for bug#3126459:select error : (group by .. order by.. ) >> 收件人: pos...@li... >> Dears, >> The enclosure is the patch for bug#3126459:select error : (group by .. >> order by.. ). >> Your advice will be appreiciated. >> Btw, I modified an error in my view that the variable standardPlan is >> always a free pointer. >> Regards, >> Benny >> >> Thanks, Benny. >> >> You definitely are addressing a bug that got introduced at some point, but >> now I get a different error for the case in question: >> >> mds=# select t1.q2, >> count(t2.*) >> from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = >> t2.q1) >> group by t1.q2 order by 1; >> ERROR: invalid reference to FROM-clause entry for table "int8_tbl" >> >> That is probably due to general RemoteQuery handling and aliasing. >> >> Anyway, I can imagine that your fix also addresses other reported issues. >> >> Thanks, >> >> Mason >> >> >> >> ------------------------------------------------------------------------------ >> Lotusphere 2011 >> Register now for Lotusphere 2011 and learn how >> to connect the dots, take your collaborative environment >> to the next level, and enter the era of Social Business. >> http://p.sf.net/sfu/lotusphere-d2d >> >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> -- >> Mason Sharp >> EnterpriseDB Corporation >> The Enterprise Postgres Company >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: xiong w. <wan...@gm...> - 2010-12-17 02:00:17
|
Hi Mason, I also found some other errors after I submit the patch, which is relative with such a bug. I will fix the problems your mentioned and we found. Regards, Benny 在 2010年12月17日 上午3:05,Mason Sharp <mas...@en...> 写道: > > > ---------- 已转发邮件 ---------- > 发件人: xiong wang <wan...@gm...> > 日期: 2010年12月15日 上午11:02 > 主题: patch for bug#3126459:select error : (group by .. order by.. ) > 收件人: pos...@li... > Dears, > The enclosure is the patch for bug#3126459:select error : (group by .. > order by.. ). > Your advice will be appreiciated. > Btw, I modified an error in my view that the variable standardPlan is > always a free pointer. > Regards, > Benny > > Thanks, Benny. > > You definitely are addressing a bug that got introduced at some point, but > now I get a different error for the case in question: > > mds=# select t1.q2, > count(t2.*) > from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = > t2.q1) > group by t1.q2 order by 1; > ERROR: invalid reference to FROM-clause entry for table "int8_tbl" > > That is probably due to general RemoteQuery handling and aliasing. > > Anyway, I can imagine that your fix also addresses other reported issues. > > Thanks, > > Mason > > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > -- > Mason Sharp > EnterpriseDB Corporation > The Enterprise Postgres Company > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > |
From: Koichi S. <suz...@os...> - 2010-12-17 00:41:01
|
Hmm... I thought it will be reasonable enough just to allow SELECT (and COMMIT/ABORT) statement in EXECUTE DIRECT semantics. Also, because we've changed the infrastructure of aggregate functions, I agree it will not safe enough to run such functions just in the coordinator. We need an infrastructure as Benny pointed out: SELECT count(*) from A, A; Because EXECUTE DIRECT is just for housekeeping usage, I think it will also be reasonable to put some restriction which is sufficient for the dedicated use. In this case, because 2PC recovery does not need aggregate, I think we can have this as is. Regards; --- Koichi (2010年12月17日 07:09), Mason Sharp wrote: > On 12/16/10 1:51 AM, Michael Paquier wrote: >> Hi all, >> >> I extended the patch so as to be able to launch utilities on targeted >> nodes (datanodes and Coordinators). >> EXECUTE DIRECT is still restricted for UPDATE and DELETE. >> And it is still not possible to launch a query on the local >> Coordinator without spreading it to the other nodes. >> >> With this patch, in the case of a 2PC transaction that is partially >> committed or partially aborted in the cluster, >> EXECUTE DIRECT can be used to target specific nodes where to send a >> COMMIT PREPARED or ABORT PREPARED. >> >> This is definitely useful for HA features and recovery also. >> > > Michael, > > in pgxc_planner(), is that block of code for only when executing on a > local coordinator? Could it be safely handled above the switch() > statement? I mean, if it is EXECUTE DIRECT, we just want to pass down > the SQL string and have it executed as is. > > I ran some brief tests. > > DBT1=# EXECUTE DIRECT on NODE 1 'select count(*) from orders'; > count > ------- > 1269 > (1 row) > > DBT1=# EXECUTE DIRECT on NODE 2 'select count(*) from orders'; > count > ------- > 1332 > (1 row) > > DBT1=# EXECUTE DIRECT on NODE 1,2 'select count(*) from orders'; > count > ------- > 2601 > (1 row) > > > For this last one, I expected to see two rows. That is, it passes down > the exact SQL string, then shows the results of each. It looks like it > is hooking into our general planning. We don't want the aggregate > managed on the coordinator (hmmm, although it may open up interesting > ideas in the future...). > > Similarly, something is not quite right with group by: > > DBT1=# EXECUTE DIRECT on NODE 1,2 'select o_status, count(*) from orders > group by o_status'; > ERROR: unrecognized node type: 656 > > > DBT1=# EXECUTE DIRECT on NODE 2 'select o_status, count(*) from orders > group by o_status'; > o_status | count > ----------+------- > | 1332 > (1 row) > > Here, too, I think we should just get the results as if 'select > o_status, count(*) from orders group by o_status' was executed on each > node, all thrown together in the results (long term we could add an > optional NODE column, like GridSQL). > > Perhaps this helps simplify things a bit. > > Thanks, > > Mason > > >> Thanks, >> >> -- >> Michael Paquier >> http://michaelpq.users.sourceforge.net >> > > |
From: Mason S. <mas...@en...> - 2010-12-16 22:09:49
|
On 12/16/10 1:51 AM, Michael Paquier wrote: > Hi all, > > I extended the patch so as to be able to launch utilities on targeted > nodes (datanodes and Coordinators). > EXECUTE DIRECT is still restricted for UPDATE and DELETE. > And it is still not possible to launch a query on the local > Coordinator without spreading it to the other nodes. > > With this patch, in the case of a 2PC transaction that is partially > committed or partially aborted in the cluster, > EXECUTE DIRECT can be used to target specific nodes where to send a > COMMIT PREPARED or ABORT PREPARED. > > This is definitely useful for HA features and recovery also. > Michael, in pgxc_planner(), is that block of code for only when executing on a local coordinator? Could it be safely handled above the switch() statement? I mean, if it is EXECUTE DIRECT, we just want to pass down the SQL string and have it executed as is. I ran some brief tests. DBT1=# EXECUTE DIRECT on NODE 1 'select count(*) from orders'; count ------- 1269 (1 row) DBT1=# EXECUTE DIRECT on NODE 2 'select count(*) from orders'; count ------- 1332 (1 row) DBT1=# EXECUTE DIRECT on NODE 1,2 'select count(*) from orders'; count ------- 2601 (1 row) For this last one, I expected to see two rows. That is, it passes down the exact SQL string, then shows the results of each. It looks like it is hooking into our general planning. We don't want the aggregate managed on the coordinator (hmmm, although it may open up interesting ideas in the future...). Similarly, something is not quite right with group by: DBT1=# EXECUTE DIRECT on NODE 1,2 'select o_status, count(*) from orders group by o_status'; ERROR: unrecognized node type: 656 DBT1=# EXECUTE DIRECT on NODE 2 'select o_status, count(*) from orders group by o_status'; o_status | count ----------+------- | 1332 (1 row) Here, too, I think we should just get the results as if 'select o_status, count(*) from orders group by o_status' was executed on each node, all thrown together in the results (long term we could add an optional NODE column, like GridSQL). Perhaps this helps simplify things a bit. Thanks, Mason > Thanks, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Mason S. <mas...@en...> - 2010-12-16 19:06:13
|
> > ---------- 已转发邮件 ---------- > 发件人: xiong wang <wan...@gm...> > 日期: 2010年12月15日 上午11:02 > 主题: patch for bug#3126459:select error : (group by .. order by.. ) > 收件人: pos...@li... > > > Dears, > > The enclosure is the patch for bug#3126459:select error : (group by .. > order by.. ). > > Your advice will be appreiciated. > > Btw, I modified an error in my view that the variable standardPlan is > always a free pointer. > > Regards, > Benny Thanks, Benny. You definitely are addressing a bug that got introduced at some point, but now I get a different error for the case in question: mds=# select t1.q2, count(t2.*) from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = t2.q1) group by t1.q2 order by 1; ERROR: invalid reference to FROM-clause entry for table "int8_tbl" That is probably due to general RemoteQuery handling and aliasing. Anyway, I can imagine that your fix also addresses other reported issues. Thanks, Mason > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-16 07:20:38
|
Hi all, I extended the patch so as to be able to launch utilities on targeted nodes (datanodes and Coordinators). EXECUTE DIRECT is still restricted for UPDATE and DELETE. And it is still not possible to launch a query on the local Coordinator without spreading it to the other nodes. With this patch, in the case of a 2PC transaction that is partially committed or partially aborted in the cluster, EXECUTE DIRECT can be used to target specific nodes where to send a COMMIT PREPARED or ABORT PREPARED. This is definitely useful for HA features and recovery also. Thanks, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-16 05:24:58
|
I corrected the comments a little bit. Please see latest version attached. -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-16 05:11:08
|
Please see attached a patch fixing EXECUTE DIRECT. It has been extended with Coordinators also, so the SQL synopsis becomes like this: EXECUTE DIRECT on { COORDINATOR num | NODE num[,num]} query; I put the following restrictions in this functionnality: 1) only SELECT queries can be used with EXECUTE DIRECT. This would be perhaps better to allow also queries such as COMMIT PREPARED and ABORT PREPARED. 2) it cannot be launched on multiple Coordinators at the same time. it is possible on multiple nodes though If a query is launched at the same time on local Coordinator and remote Coordinator, XC is not able to merge results well. There is still one bug. In the case of launching EXECUTE DIRECT on local coordinator with a query containing the name of a non-catalog table, this query is launched on nodes. I was looking for a fix in allpaths.c, where RemoteQuery paths are set, but a fix for that looks a little bit tricky. btw, it is not really important for the HA features in short term as EXECUTE DIRECT is planned to be used to have a look on catalog tables on remote Coordinators (and perhaps targeting nodes with COMMIT/ABORT PREPARED queries). Thanks, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-15 06:37:49
|
Please see attached a work-in-progress patch. Avoid to apply it on your code, because it doesn't work yet. I am sending it because I would need some feedback. The patch is decomposed in two parts. First the query is analyzed. In the case of an execute direct being launched on local Coordinator, the query is parsed and analyzed, then it is returned as a normal Query node. As this query is analyzed, it can go through the planner. For an execute direct on a remote node, the query is analyzed to get the command type for pgxc_planner. and the list of nodes is saved in a RemoteQuery node that is returned with Query result using utilityStmt. I tried to change pgxc planner to manage the particular case of EXECUTE DIRECT by keeping in planner the node list set in analyze, but it doesn't seem to be the right way of doing. I am not really an expert of this part of the code, so feedback would be appreciated, particularly on the following points: Is this patch using the correct logic in planner and analyze? Does the query really need to go through the planner? In this case, is setting Query as a CMD_UTILITY with a RemoteQuery node in utilityStmt is enough or not when analyzing? (the patch currently does NOT do it.) Are the Query fields set in analyse correct? Isn't there something missing in the planner that is not set? We rewrite the statement in Query at the end of pg_analyze_rewrite in postgres.c, but it is not the same query for EXECUTE DIRECT? Is this correct to change it directly in XC planner? -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: xiong w. <wan...@gm...> - 2010-12-15 03:06:20
|
Hi, Sorry for missing the enclosure. Regards, Benny ---------- 已转发邮件 ---------- 发件人: xiong wang <wan...@gm...> 日期: 2010年12月15日 上午11:02 主题: patch for bug#3126459:select error : (group by .. order by.. ) 收件人: pos...@li... Dears, The enclosure is the patch for bug#3126459:select error : (group by .. order by.. ). Your advice will be appreiciated. Btw, I modified an error in my view that the variable standardPlan is always a free pointer. Regards, Benny |
From: xiong w. <wan...@gm...> - 2010-12-15 03:03:01
|
Dears, The enclosure is the patch for bug#3126459:select error : (group by .. order by.. ). Your advice will be appreiciated. Btw, I modified an error in my view that the variable standardPlan is always a free pointer. Regards, Benny |
From: Michael P. <mic...@gm...> - 2010-12-15 02:37:12
|
> Just took a brief look so far. Seems better. > > I understand that recovery and HA is in development and things are being > done to lay the groundwork and improve, and that with this patch we are not > trying to yet handle any and every situation. What happens if the > coordinator fails before it can update GTM though? > In this case the information is not saved on GTM. For a Coordinator crash, I was thinking of an external utility associated with the monitoring agent in charge of analyzing prepared transactions of the crashed Coordinator. This utility would analyze in the cluster the prepared transaction of the crashed Coordinator, and decide automatically which one to abort, commit depending on the transaction situation. For this purpose, it is essential to extend the 2PC information sent to Nodes (Datanodes of course, but Coordinators included in case of DDL). The patch extending 2PC information on nodes is also on this thread (patch based on version 6 of implicit 2pc patch). In this case I believe it is not necessary to save any info on GTM as the extended 2PC information only would be necessary to analyze the 2PC transaction of the crashed Coordinator. > Also, I did a test and got this: > > > WARNING: unexpected EOF on datanode connection > WARNING: Connection to Datanode 1 has unexpected state 1 and will be > dropped > > ERROR: Could not commit prepared transaction implicitely > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > #0 0x907afe42 in kill$UNIX2003 () > #1 0x9082223a in raise () > #2 0x9082e679 in abort () > #3 0x003917ce in ExceptionalCondition (conditionName=0x433f6c > "!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 > "FailedAssertion", fileName=0x433f50 "procarray.c", lineNumber=283) at > assert.c:57 > #4 0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, latestXid=1018) > at procarray.c:283 > #5 0x0005905c in AbortTransaction () at xact.c:2525 > #6 0x00059a6e in AbortCurrentTransaction () at xact.c:3001 > #7 0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094 > #8 0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 > "masonsharp") at postgres.c:3622 > #9 0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607 > #10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216 > #11 0x002542b5 in ServerLoop () at postmaster.c:1445 > #12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at > postmaster.c:1098 > #13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188 > I suppose you enabled assertions when doing this test. The Coordinator was complaining that its transaction ID in PGProc was not correct. It is indeed true as in the case tested the transaction has ever committed on Coordinator. > I did the same test as before. I killed a data node after it received a > COMMIT PREPARED message. > > I think we should be able to continue. > > The good news is that I should not see partially committed data, which I do > not. > > But if I try and manually commit it from a new connection to the > coordinator: > > mds=# COMMIT PREPARED 'T1018'; > ERROR: Could not get GID data from GTM > > Maybe GTM removed this info when the coordinator disconnected? (Or maybe > implicit transactions are only associated with a certain connection?) > Yes it has been removed when your Coordinator instance crashed. I can see the transaction on one data node, but not the other. > > Ideally we would come up with a scheme where if the coordinator session > does not notify GTM, we can somehow recover. Maybe this is my fault- I > believe I advocated avoiding the extra work for implicit 2PC in the name of > performance. :-) > > We can think about what to do in the short term, and how to handle in the > long term. > > In the short term, your approach may be good enough once debugged, since it > is a relatively rare case. > > Long term we could think about a thread that runs on GTM and wakes up every > 30 or 60 seconds or so (configurable), collects implicit transactions from > the nodes (extension to pg_prepared_xacts required?) and if it sees that the > XID does not have an associated live connection, knows that something went > awry. It then sees if it committed on any of the nodes. If not, rollback > all, if it did on at least one, commit on all. If one of the data nodes is > down, it won't do anything, perhaps log a warning. This would avoid user > intervention, and would be pretty cool. Some of this code you may already > have been working on for recovery and we could reuse here. > > This is a nice idea. It depends of course on one thing; if we decide to base the HA features on a monitoring agent only or if XC should be able to run on its own (or even allow both modes). -- Michael Paquier http://michaelpq.users.sourceforge.net |