You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
|
1
(12) |
2
(2) |
3
(1) |
4
(6) |
5
(11) |
6
(4) |
7
(6) |
8
(5) |
9
|
10
|
11
(1) |
12
(6) |
13
(23) |
14
(5) |
15
(5) |
16
|
17
|
18
(1) |
19
|
20
(2) |
21
(2) |
22
(4) |
23
|
24
|
25
|
26
|
27
(3) |
28
(1) |
29
(2) |
30
|
From: Michael P. <mic...@gm...> - 2011-04-27 23:34:34
|
This patch is what I could call a "research" patch, it is not really ready for commit at this state of development and I am personally not satisfied with that. About COPY, I haven't made a lot of tests yet, but this patch just rewrites INSERT queries. Then it doesn't handle cases where nextval is in WHERE clauses because this is automatically push down to nodes. In order to do that, I was thinking about changing the function expression of nextval directly when transforming the expression in TransformExpr. This would make the implementation a little bit deeper, so there may be side effects, but I suppose it could handle all the cases of queries needing sequence values from GTM. Those queries are basically the ones using nextval, currval, lastval (does not need GTM) and setval. I also think it may be able to handle the cases of even subqueries using sequence functions. Another way of doing would be to do the query transformation neither in rewrite not analyze, but directly in Executor, and then execution nodes would be determined at execution time, but this would need planner extension... On Wed, Apr 27, 2011 at 8:55 PM, Mason <ma...@us...>wrote: > On Wed, Apr 27, 2011 at 5:02 AM, Michael Paquier > <mic...@gm...> wrote: > > Hi all, > > > > Please find attached a patch that adds support for DEFAULT nextval for > XC. > > The main idea of this patch is to detect when a nextval function is used > as > > default and replace the function expression of nextval by a constant > > expression > > when rewriting the query. > > I haven't yet run regressions on this patch, and I am still looking for a > > way to expand that for other non-immutable functions. > > > > The reason why I wanted to replace the value before planner was that we > > determine at planner level which node(s) to target, > > so rewriter looked to be a good place for that. > > > > Here is an example of query for this patch: > > create sequence toto; > > create table ab (a int default nextval('toto'), b int); > > insert into ab (b) values (2); > > insert into ab (b) values (2); > > insert into ab (b) values (2); > > template1=# select * from ab; > > a | b > > ---+--- > > 1 | 2 > > 2 | 2 > > 3 | 2 > > > > Any comments about this way of doing are welcome. > > It is nice to see this being tackled. > > Does the patch handle COPY, too? > > How about INSERT SELECT? (Or, have you guys disabled support for that?) > > Thanks, > > Mason > > > > -- > > Thanks, > > > > Michael Paquier > > http://michael.otacoo.com > > > > > ------------------------------------------------------------------------------ > > WhatsUp Gold - Download Free Network Management Software > > The most intuitive, comprehensive, and cost-effective network > > management toolset available today. Delivers lowest initial > > acquisition cost and overall TCO of any competing solution. > > http://p.sf.net/sfu/whatsupgold-sd > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Mason <ma...@us...> - 2011-04-27 11:56:04
|
On Wed, Apr 27, 2011 at 5:02 AM, Michael Paquier <mic...@gm...> wrote: > Hi all, > > Please find attached a patch that adds support for DEFAULT nextval for XC. > The main idea of this patch is to detect when a nextval function is used as > default and replace the function expression of nextval by a constant > expression > when rewriting the query. > I haven't yet run regressions on this patch, and I am still looking for a > way to expand that for other non-immutable functions. > > The reason why I wanted to replace the value before planner was that we > determine at planner level which node(s) to target, > so rewriter looked to be a good place for that. > > Here is an example of query for this patch: > create sequence toto; > create table ab (a int default nextval('toto'), b int); > insert into ab (b) values (2); > insert into ab (b) values (2); > insert into ab (b) values (2); > template1=# select * from ab; > a | b > ---+--- > 1 | 2 > 2 | 2 > 3 | 2 > > Any comments about this way of doing are welcome. It is nice to see this being tackled. Does the patch handle COPY, too? How about INSERT SELECT? (Or, have you guys disabled support for that?) Thanks, Mason > -- > Thanks, > > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > WhatsUp Gold - Download Free Network Management Software > The most intuitive, comprehensive, and cost-effective network > management toolset available today. Delivers lowest initial > acquisition cost and overall TCO of any competing solution. > http://p.sf.net/sfu/whatsupgold-sd > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Michael P. <mic...@gm...> - 2011-04-27 09:03:01
|
Hi all, Please find attached a patch that adds support for DEFAULT nextval for XC. The main idea of this patch is to detect when a nextval function is used as default and replace the function expression of nextval by a constant expression when rewriting the query. I haven't yet run regressions on this patch, and I am still looking for a way to expand that for other non-immutable functions. The reason why I wanted to replace the value before planner was that we determine at planner level which node(s) to target, so rewriter looked to be a good place for that. Here is an example of query for this patch: create sequence toto; create table ab (a int default nextval('toto'), b int); insert into ab (b) values (2); insert into ab (b) values (2); insert into ab (b) values (2); template1=# select * from ab; a | b ---+--- 1 | 2 2 | 2 3 | 2 Any comments about this way of doing are welcome. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2011-04-22 07:00:25
|
On Fri, Apr 22, 2011 at 12:14 PM, Andrei Martsinchyk < and...@gm...> wrote: > > > 2011/4/22 Ashutosh Bapat <ash...@en...> > >> >> >> On Fri, Apr 22, 2011 at 11:23 AM, Andrei Martsinchyk < >> and...@gm...> wrote: >> >>> Hi Ashutosh, >>> >>> This code was written more then year ago, when XC had just been started. >>> At that time I was not familiar well with Postgres internals and task was >>> quickly enable basic aggregate functions. >>> So I found code implementing aggregates and ported it into RemoteQuery, >>> making up Coordinator logic. Also I had to change Datanode part (original >>> code) because we did not want to finish aggregation on data nodes, but >>> return intermediate result. >>> I skipped porting of code related to grouping because it was not planned >>> to add grouping support and priject schedule was tight. >>> To support grouping you can either port grouping logic into RemoteQuery >>> or create plan nodes on top of Remote Query and process its output, as more >>> confenient for you. Either way have pros and cons, each may have hidden >>> pitfalls, >>> >> >> Understandable. If we are to look at problems like aggregates, ordering, >> etc. one at a time, the current solutions are good and they have lead down >> the basic logic as to how to optimize each of those individual problems for >> XC kind of environment. I really appreciate those solutions. Now, since we >> are adding more and more functionality (and ultimately aim to bring it par >> PG as far as functionality is concerned), we need to integrate these >> individual solutions in PG's inherent structure, so that our development can >> scale and is manageable. So, may be we should take the effort to do so now. >> >> BTW, you mentioned about cons and pitfalls. Do you have a hitch or idea of >> what those could be? In case you have any idea, please share, so that we can >> avoid those before starting this effort. >> >> > > I did not mean anything specific. Postgres code is pretty complex, > :) I agree, but it's simpler than some other DBMS codes I have seen. > I often had unforseen issues when implemented things. > I will let you know if I get an idea. > Sure, please do so, it will help. > > > > >> >>> >>> 2011/4/21 Ashutosh Bapat <ash...@en...> >>> >>>> Hi All, >>>> I am looking at supporting and optimizing GROUP BY clauses in PGXC. I >>>> found that currently we are storing the information related to aggregates, >>>> order by clauses, distinct clauses in RemoteQuery/RemoteQueryState node. >>>> This information is used during ExecRemoteQuery() to apply corresponding >>>> clauses. >>>> >>>> Looking at the code which builds the RemoteQuery* nodes and >>>> ExecRemoteQuery and ExecInitRemoteQuery(), I see that if we use aggregates >>>> and order by clause together the code would work wierdly since it does not >>>> handle such complexities. An example query is >>>> postgres=# select sum(val) from emp; >>>> sum >>>> ----- >>>> 3 >>>> (1 row) >>>> >>>> postgres=# select sum(val) from emp order by sum(val); >>>> sum >>>> ----- >>>> 1 >>>> 2 >>>> (2 rows) >>>> >>>> Since there is no group by clause we expect the aggregate to return only >>>> one row and thus order by applied on it should not have any effect. >>>> >>>> Also, keeping clauses related information in RemoteQuery, needs us to >>>> create parallel nodes (structures) for aggregates, group by, order by, >>>> distinct etc. and thus duplicating the code. The more clauses and add >>>> complexity in those clauses (e.g. order by with group by, having etc, set >>>> operations etc.) we will start duplicating more code and thus increase >>>> maintenance later. There may be a reason why we chose to design the things >>>> the way they are. I am trying to understand the same. Can someone throw a >>>> light. >>>> >>>> If there is not strong enough reason, we should rather use the existing >>>> PG nodes themselves with special handling in case of remote queries. For >>>> example Agg and AggState need to understand that the rows they will receive >>>> are not the raw table rows, but the rows with transition results for >>>> aggregates OR they are already grouped by per node, (and may be ordered >>>> according to group clause). Similarly for OrderBy node, it needs to >>>> understand that the rows are already ordered per node thus it needs to apply >>>> merge sort. This might give us a slow start, but will be beneficial in long >>>> run since PG has all the necessary infrastructure in place to handle these >>>> nodes in anyway they appear in the tree. >>>> >>>> Please comment. >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Benefiting from Server Virtualization: Beyond Initial Workload >>>> Consolidation -- Increasing the use of server virtualization is a top >>>> priority.Virtualization can reduce costs, simplify management, and >>>> improve >>>> application availability and disaster protection. Learn more about >>>> boosting >>>> the value of server virtualization. >>>> http://p.sf.net/sfu/vmware-sfdev2dev >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> Andrei Martsinchyk mailto:and...@gm... >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> > > > -- > Best regards, > Andrei Martsinchyk mailto:and...@gm... > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Andrei M. <and...@gm...> - 2011-04-22 06:45:09
|
2011/4/22 Ashutosh Bapat <ash...@en...> > > > On Fri, Apr 22, 2011 at 11:23 AM, Andrei Martsinchyk < > and...@gm...> wrote: > >> Hi Ashutosh, >> >> This code was written more then year ago, when XC had just been started. >> At that time I was not familiar well with Postgres internals and task was >> quickly enable basic aggregate functions. >> So I found code implementing aggregates and ported it into RemoteQuery, >> making up Coordinator logic. Also I had to change Datanode part (original >> code) because we did not want to finish aggregation on data nodes, but >> return intermediate result. >> I skipped porting of code related to grouping because it was not planned >> to add grouping support and priject schedule was tight. >> To support grouping you can either port grouping logic into RemoteQuery or >> create plan nodes on top of Remote Query and process its output, as more >> confenient for you. Either way have pros and cons, each may have hidden >> pitfalls, >> > > Understandable. If we are to look at problems like aggregates, ordering, > etc. one at a time, the current solutions are good and they have lead down > the basic logic as to how to optimize each of those individual problems for > XC kind of environment. I really appreciate those solutions. Now, since we > are adding more and more functionality (and ultimately aim to bring it par > PG as far as functionality is concerned), we need to integrate these > individual solutions in PG's inherent structure, so that our development can > scale and is manageable. So, may be we should take the effort to do so now. > > BTW, you mentioned about cons and pitfalls. Do you have a hitch or idea of > what those could be? In case you have any idea, please share, so that we can > avoid those before starting this effort. > > I did not mean anything specific. Postgres code is pretty complex, I often had unforseen issues when implemented things. I will let you know if I get an idea. > >> >> 2011/4/21 Ashutosh Bapat <ash...@en...> >> >>> Hi All, >>> I am looking at supporting and optimizing GROUP BY clauses in PGXC. I >>> found that currently we are storing the information related to aggregates, >>> order by clauses, distinct clauses in RemoteQuery/RemoteQueryState node. >>> This information is used during ExecRemoteQuery() to apply corresponding >>> clauses. >>> >>> Looking at the code which builds the RemoteQuery* nodes and >>> ExecRemoteQuery and ExecInitRemoteQuery(), I see that if we use aggregates >>> and order by clause together the code would work wierdly since it does not >>> handle such complexities. An example query is >>> postgres=# select sum(val) from emp; >>> sum >>> ----- >>> 3 >>> (1 row) >>> >>> postgres=# select sum(val) from emp order by sum(val); >>> sum >>> ----- >>> 1 >>> 2 >>> (2 rows) >>> >>> Since there is no group by clause we expect the aggregate to return only >>> one row and thus order by applied on it should not have any effect. >>> >>> Also, keeping clauses related information in RemoteQuery, needs us to >>> create parallel nodes (structures) for aggregates, group by, order by, >>> distinct etc. and thus duplicating the code. The more clauses and add >>> complexity in those clauses (e.g. order by with group by, having etc, set >>> operations etc.) we will start duplicating more code and thus increase >>> maintenance later. There may be a reason why we chose to design the things >>> the way they are. I am trying to understand the same. Can someone throw a >>> light. >>> >>> If there is not strong enough reason, we should rather use the existing >>> PG nodes themselves with special handling in case of remote queries. For >>> example Agg and AggState need to understand that the rows they will receive >>> are not the raw table rows, but the rows with transition results for >>> aggregates OR they are already grouped by per node, (and may be ordered >>> according to group clause). Similarly for OrderBy node, it needs to >>> understand that the rows are already ordered per node thus it needs to apply >>> merge sort. This might give us a slow start, but will be beneficial in long >>> run since PG has all the necessary infrastructure in place to handle these >>> nodes in anyway they appear in the tree. >>> >>> Please comment. >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Benefiting from Server Virtualization: Beyond Initial Workload >>> Consolidation -- Increasing the use of server virtualization is a top >>> priority.Virtualization can reduce costs, simplify management, and >>> improve >>> application availability and disaster protection. Learn more about >>> boosting >>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> Best regards, >> Andrei Martsinchyk mailto:and...@gm... >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Best regards, Andrei Martsinchyk mailto:and...@gm... |
From: Ashutosh B. <ash...@en...> - 2011-04-22 06:13:51
|
On Fri, Apr 22, 2011 at 11:23 AM, Andrei Martsinchyk < and...@gm...> wrote: > Hi Ashutosh, > > This code was written more then year ago, when XC had just been started. At > that time I was not familiar well with Postgres internals and task was > quickly enable basic aggregate functions. > So I found code implementing aggregates and ported it into RemoteQuery, > making up Coordinator logic. Also I had to change Datanode part (original > code) because we did not want to finish aggregation on data nodes, but > return intermediate result. > I skipped porting of code related to grouping because it was not planned to > add grouping support and priject schedule was tight. > To support grouping you can either port grouping logic into RemoteQuery or > create plan nodes on top of Remote Query and process its output, as more > confenient for you. Either way have pros and cons, each may have hidden > pitfalls, > Understandable. If we are to look at problems like aggregates, ordering, etc. one at a time, the current solutions are good and they have lead down the basic logic as to how to optimize each of those individual problems for XC kind of environment. I really appreciate those solutions. Now, since we are adding more and more functionality (and ultimately aim to bring it par PG as far as functionality is concerned), we need to integrate these individual solutions in PG's inherent structure, so that our development can scale and is manageable. So, may be we should take the effort to do so now. BTW, you mentioned about cons and pitfalls. Do you have a hitch or idea of what those could be? In case you have any idea, please share, so that we can avoid those before starting this effort. > > > 2011/4/21 Ashutosh Bapat <ash...@en...> > >> Hi All, >> I am looking at supporting and optimizing GROUP BY clauses in PGXC. I >> found that currently we are storing the information related to aggregates, >> order by clauses, distinct clauses in RemoteQuery/RemoteQueryState node. >> This information is used during ExecRemoteQuery() to apply corresponding >> clauses. >> >> Looking at the code which builds the RemoteQuery* nodes and >> ExecRemoteQuery and ExecInitRemoteQuery(), I see that if we use aggregates >> and order by clause together the code would work wierdly since it does not >> handle such complexities. An example query is >> postgres=# select sum(val) from emp; >> sum >> ----- >> 3 >> (1 row) >> >> postgres=# select sum(val) from emp order by sum(val); >> sum >> ----- >> 1 >> 2 >> (2 rows) >> >> Since there is no group by clause we expect the aggregate to return only >> one row and thus order by applied on it should not have any effect. >> >> Also, keeping clauses related information in RemoteQuery, needs us to >> create parallel nodes (structures) for aggregates, group by, order by, >> distinct etc. and thus duplicating the code. The more clauses and add >> complexity in those clauses (e.g. order by with group by, having etc, set >> operations etc.) we will start duplicating more code and thus increase >> maintenance later. There may be a reason why we chose to design the things >> the way they are. I am trying to understand the same. Can someone throw a >> light. >> >> If there is not strong enough reason, we should rather use the existing PG >> nodes themselves with special handling in case of remote queries. For >> example Agg and AggState need to understand that the rows they will receive >> are not the raw table rows, but the rows with transition results for >> aggregates OR they are already grouped by per node, (and may be ordered >> according to group clause). Similarly for OrderBy node, it needs to >> understand that the rows are already ordered per node thus it needs to apply >> merge sort. This might give us a slow start, but will be beneficial in long >> run since PG has all the necessary infrastructure in place to handle these >> nodes in anyway they appear in the tree. >> >> Please comment. >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> >> ------------------------------------------------------------------------------ >> Benefiting from Server Virtualization: Beyond Initial Workload >> Consolidation -- Increasing the use of server virtualization is a top >> priority.Virtualization can reduce costs, simplify management, and improve >> application availability and disaster protection. Learn more about >> boosting >> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Best regards, > Andrei Martsinchyk mailto:and...@gm... > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Andrei M. <and...@gm...> - 2011-04-22 05:53:23
|
Hi Ashutosh, This code was written more then year ago, when XC had just been started. At that time I was not familiar well with Postgres internals and task was quickly enable basic aggregate functions. So I found code implementing aggregates and ported it into RemoteQuery, making up Coordinator logic. Also I had to change Datanode part (original code) because we did not want to finish aggregation on data nodes, but return intermediate result. I skipped porting of code related to grouping because it was not planned to add grouping support and priject schedule was tight. To support grouping you can either port grouping logic into RemoteQuery or create plan nodes on top of Remote Query and process its output, as more confenient for you. Either way have pros and cons, each may have hidden pitfalls. 2011/4/21 Ashutosh Bapat <ash...@en...> > Hi All, > I am looking at supporting and optimizing GROUP BY clauses in PGXC. I found > that currently we are storing the information related to aggregates, order > by clauses, distinct clauses in RemoteQuery/RemoteQueryState node. This > information is used during ExecRemoteQuery() to apply corresponding clauses. > > Looking at the code which builds the RemoteQuery* nodes and ExecRemoteQuery > and ExecInitRemoteQuery(), I see that if we use aggregates and order by > clause together the code would work wierdly since it does not handle such > complexities. An example query is > postgres=# select sum(val) from emp; > sum > ----- > 3 > (1 row) > > postgres=# select sum(val) from emp order by sum(val); > sum > ----- > 1 > 2 > (2 rows) > > Since there is no group by clause we expect the aggregate to return only > one row and thus order by applied on it should not have any effect. > > Also, keeping clauses related information in RemoteQuery, needs us to > create parallel nodes (structures) for aggregates, group by, order by, > distinct etc. and thus duplicating the code. The more clauses and add > complexity in those clauses (e.g. order by with group by, having etc, set > operations etc.) we will start duplicating more code and thus increase > maintenance later. There may be a reason why we chose to design the things > the way they are. I am trying to understand the same. Can someone throw a > light. > > If there is not strong enough reason, we should rather use the existing PG > nodes themselves with special handling in case of remote queries. For > example Agg and AggState need to understand that the rows they will receive > are not the raw table rows, but the rows with transition results for > aggregates OR they are already grouped by per node, (and may be ordered > according to group clause). Similarly for OrderBy node, it needs to > understand that the rows are already ordered per node thus it needs to apply > merge sort. This might give us a slow start, but will be beneficial in long > run since PG has all the necessary infrastructure in place to handle these > nodes in anyway they appear in the tree. > > Please comment. > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best regards, Andrei Martsinchyk mailto:and...@gm... |
From: Michael P. <mic...@gm...> - 2011-04-21 23:19:59
|
On Thu, Apr 21, 2011 at 9:19 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi All, > I am looking at supporting and optimizing GROUP BY clauses in PGXC. I found > that currently we are storing the information related to aggregates, order > by clauses, distinct clauses in RemoteQuery/RemoteQueryState node. This > information is used during ExecRemoteQuery() to apply corresponding clauses. > > Looking at the code which builds the RemoteQuery* nodes and ExecRemoteQuery > and ExecInitRemoteQuery(), I see that if we use aggregates and order by > clause together the code would work wierdly since it does not handle such > complexities. An example query is > postgres=# select sum(val) from emp; > sum > ----- > 3 > (1 row) > > postgres=# select sum(val) from emp order by sum(val); > sum > ----- > 1 > 2 > (2 rows) > > Since there is no group by clause we expect the aggregate to return only > one row and thus order by applied on it should not have any effect. > Such problems are the origin of a lot of diffs in regressions. > > Also, keeping clauses related information in RemoteQuery, needs us to > create parallel nodes (structures) for aggregates, group by, order by, > distinct etc. and thus duplicating the code. The more clauses and add > complexity in those clauses (e.g. order by with group by, having etc, set > operations etc.) we will start duplicating more code and thus increase > maintenance later. There may be a reason why we chose to design the things > the way they are. I am trying to understand the same. Can someone throw a > light. > I unfortunately can't be of a great help about that, as I am not the former implementer of those functionalities. Maybe Andrei or Mason could explain why they chose such an implementation. That's true that it would be a pain to have to create several structures in RemoteQuery & RemoteQueryState (which are enough complicated already). > > If there is not strong enough reason, we should rather use the existing PG > nodes themselves with special handling in case of remote queries. For > example Agg and AggState need to understand that the rows they will receive > are not the raw table rows, but the rows with transition results for > aggregates OR they are already grouped by per node, (and may be ordered > according to group clause). Similarly for OrderBy node, it needs to > understand that the rows are already ordered per node thus it needs to apply > merge sort. This might give us a slow start, but will be beneficial in long > run since PG has all the necessary infrastructure in place to handle these > nodes in anyway they appear in the tree. > Well, I think it is a good idea. I particularly see no reason why not using PG Nodes to manage handling of remote queries, it would make the code comprehension easier like for what is done for expressions in Postgres (expressions are simpler though). If anybody has comments, they are welcome. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2011-04-21 12:19:41
|
Hi All, I am looking at supporting and optimizing GROUP BY clauses in PGXC. I found that currently we are storing the information related to aggregates, order by clauses, distinct clauses in RemoteQuery/RemoteQueryState node. This information is used during ExecRemoteQuery() to apply corresponding clauses. Looking at the code which builds the RemoteQuery* nodes and ExecRemoteQuery and ExecInitRemoteQuery(), I see that if we use aggregates and order by clause together the code would work wierdly since it does not handle such complexities. An example query is postgres=# select sum(val) from emp; sum ----- 3 (1 row) postgres=# select sum(val) from emp order by sum(val); sum ----- 1 2 (2 rows) Since there is no group by clause we expect the aggregate to return only one row and thus order by applied on it should not have any effect. Also, keeping clauses related information in RemoteQuery, needs us to create parallel nodes (structures) for aggregates, group by, order by, distinct etc. and thus duplicating the code. The more clauses and add complexity in those clauses (e.g. order by with group by, having etc, set operations etc.) we will start duplicating more code and thus increase maintenance later. There may be a reason why we chose to design the things the way they are. I am trying to understand the same. Can someone throw a light. If there is not strong enough reason, we should rather use the existing PG nodes themselves with special handling in case of remote queries. For example Agg and AggState need to understand that the rows they will receive are not the raw table rows, but the rows with transition results for aggregates OR they are already grouped by per node, (and may be ordered according to group clause). Similarly for OrderBy node, it needs to understand that the rows are already ordered per node thus it needs to apply merge sort. This might give us a slow start, but will be beneficial in long run since PG has all the necessary infrastructure in place to handle these nodes in anyway they appear in the tree. Please comment. -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2011-04-20 12:24:20
|
Thanks for your report. It is a known issue and we are currently working on fixing that so it is not necessary to fill in a bug report. It is thought that it has been introduced after the merge with PostgreSQL 9.0.3. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: 黄秋华 <ra...@16...> - 2011-04-15 05:14:03
|
OK , thanks a lot for reviewing code~ rahuahua -- At 2011-04-15 12:08:05,"Michael Paquier" <mic...@gm...> wrote: Regressions are OK. It has been committed with ID 820571e184fb6ae4dd4e63f14724afab283112e2. Thanks a lot for the good work ! Michael |
From: Michael P. <mic...@gm...> - 2011-04-15 04:08:13
|
Regressions are OK. It has been committed with ID 820571e184fb6ae4dd4e63f14724afab283112e2. Thanks a lot for the good work ! Michael |
From: Michael P. <mic...@gm...> - 2011-04-15 03:46:27
|
OK, thanks, I am running them too. Btw, you did a good job by finding this bug. All my individual test cases are working now. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: rahua1<ra...@16...> - 2011-04-15 00:59:58
|
OK, I will do regression test. 2011-04-15 rahua1 发件人: Michael Paquier <mic...@gm...> 发送时间: 2011-04-15 08:22 主 题: Re: [Postgres-xc-developers] the patch for fixing bug#3124253 收件人: 黄秋华 <ra...@16...> 抄 送: pos...@li... Hi, Indeed there is an error of logic in the code. But I think the problem you analyzed was bug 3231445 (which is related to 3140473), and not bug 3124253 that has been solved last month. However, you are right in the fact that by launching the following queries: create table country (country_ab char(2), country_name char(30)); insert into country values ('ZZ','FOFO'),('AA','AAAA'); copy country to '/tmp/country.data'; I got gdb going like that: Breakpoint 1, build_copy_statement (cstate=0x10d7cd8, attnamelist=0x0, tupDesc=0x7f929b56c190, is_from=0 '\000', force_quote=0x0, force_notnull=0x0) at copy.c:3872 3872 ExecNodes *exec_nodes = makeNode(ExecNodes); (gdb) n 3879 cstate->rel_loc = GetRelationLocInfo(RelationGetRelid(cstate->rel)); (gdb) 3881 pPartByCol = GetRelationDistColumn(cstate->rel_loc); (gdb) p cstate->rel_loc $1 = (RelationLocInfo *) 0x10d7f48 (gdb) p *cstate->rel_loc $2 = {relid = 16436, locatorType = 78 'N', partAttrNum = 0, partAttrName = 0x0, nodeCount = 2, nodeList = 0x10d7fc8, roundRobinNode = 0x0} (gdb) n 3882 if (cstate->rel_loc) (gdb) 3884 if (is_from || pPartByCol) (gdb) 3892 exec_nodes->nodelist = GetAnyDataNode(); (gdb) What it indeed incorrect because for a copy to we need all the connections for a round robin table. The origin of the problem is that here the table country has no distribution data, and the analysis is based on that. I think I'll be able to commit that patch, after checking it passes regression and test cases attached (perhaps you may be interested). -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-04-15 00:22:48
|
Hi, Indeed there is an error of logic in the code. But I think the problem you analyzed was bug 3231445 (which is related to 3140473), and not bug 3124253 that has been solved last month. However, you are right in the fact that by launching the following queries: create table country (country_ab char(2), country_name char(30)); insert into country values ('ZZ','FOFO'),('AA','AAAA'); copy country to '/tmp/country.data'; I got gdb going like that: Breakpoint 1, build_copy_statement (cstate=0x10d7cd8, attnamelist=0x0, tupDesc=0x7f929b56c190, is_from=0 '\000', force_quote=0x0, force_notnull=0x0) at copy.c:3872 3872 ExecNodes *exec_nodes = makeNode(ExecNodes); (gdb) n 3879 cstate->rel_loc = GetRelationLocInfo(RelationGetRelid(cstate->rel)); (gdb) 3881 pPartByCol = GetRelationDistColumn(cstate->rel_loc); (gdb) p cstate->rel_loc $1 = (RelationLocInfo *) 0x10d7f48 (gdb) p *cstate->rel_loc $2 = {relid = 16436, locatorType = 78 'N', partAttrNum = 0, partAttrName = 0x0, nodeCount = 2, nodeList = 0x10d7fc8, roundRobinNode = 0x0} (gdb) n 3882 if (cstate->rel_loc) (gdb) 3884 if (is_from || pPartByCol) (gdb) 3892 exec_nodes->nodelist = GetAnyDataNode(); (gdb) What it indeed incorrect because for a copy to we need all the connections for a round robin table. The origin of the problem is that here the table country has no distribution data, and the analysis is based on that. I think I'll be able to commit that patch, after checking it passes regression and test cases attached (perhaps you may be interested). -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-04-14 23:24:31
|
OK, thanks. I am going to have a look at it and may have some feedback soon. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-04-14 06:28:31
|
Hi, This version was better, thanks. I just noticed a couple of typo mistakes, but it's OK, it has been committed. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-04-14 05:00:50
|
Hi, I am still digging into this issue without really finding something for the moment. Btw, I have a question for Andrei: When copying a slot, why this comment: /* if it is correct to copy Datums using assignment? */ dst->tts_values[i] = src->tts_values[i]; dst->tts_isnull[i] = src->tts_isnull[i]; At this point a copy of argument values and booleans is made from a source slot to a destination slot. Could it be the cause of the cache problem? -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Wang D. <dia...@gm...> - 2011-04-14 02:51:33
|
Hi, How about this version? Michael Paquier <mic...@gm...> writes: > Thanks a lot for this input. > I had a quick look at this patch and syntax is correct. > However it looks there is a lot of duplicated code. > > Couldn't it be possible to put the functions SetDataDir and make_absolute_path directly in the file > src/gtm/path.c ? > We could imagine that SetDataDir has a malloc'd string as return value instead of using this > function to save the new static repository directly in this function. > > Such a solution avoids unnecessary duplication. > -- > Thanks, > > Michael Paquier > http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-04-13 11:48:52
|
> > Couldn't it be possible to put the functions SetDataDir and > make_absolute_path directly in the file src/gtm/path.c ? > Sorry, the file I was talking about is src/gtm/path/path.c. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-04-13 11:46:41
|
Thanks a lot for this input. I had a quick look at this patch and syntax is correct. However it looks there is a lot of duplicated code. Couldn't it be possible to put the functions SetDataDir and make_absolute_path directly in the file src/gtm/path.c ? We could imagine that SetDataDir has a malloc'd string as return value instead of using this function to save the new static repository directly in this function. Such a solution avoids unnecessary duplication. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@te...> - 2011-04-13 11:15:50
|
Oh thanks a lot, we will review this patch soon. On Wed, Apr 13, 2011 at 4:12 PM, Wang Diancheng <dia...@gm...>wrote: > Hi, > > convert the relative path to the absolute path in gtm and gtm_proxy. > > > > ------------------------------------------------------------------------------ > Forrester Wave Report - Recovery time is now measured in hours and minutes > not days. Key insights are discussed in the 2010 Forrester Wave Report as > part of an in-depth evaluation of disaster recovery service providers. > Forrester found the best-in-class provider in terms of services and vision. > Read this report now! http://p.sf.net/sfu/ibm-webcastpromo > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Wang D. <dia...@gm...> - 2011-04-13 11:12:36
|
Hi, convert the relative path to the absolute path in gtm and gtm_proxy. |
From: Abbas B. <abb...@te...> - 2011-04-13 10:37:48
|
The coordinator should not be trying to execute the function, it is only the data nodes who are supposed to. If the query is sent directly to data nodes it seems to work, which means that the whole mechanism of function execution is correct. On Wed, Apr 13, 2011 at 3:12 PM, Abbas Butt <abb...@te...>wrote: > I have observed that when the coordinator receives this query from the > client > > SELECT p.hobbies FROM person p; > the query sent to the data nodes is > SELECT p.* FROM public.person p > which is incorrect. In fact the co-ordinator should have sent the same > query to the data nodes. > > On Wed, Apr 13, 2011 at 1:06 PM, Michael Paquier < > mic...@gm...> wrote: > >> Hi, >> >> Could it be possible that the origin of this bug is that function >> expressions are not preprocessed in current planner of XC? >> I found that in Postgres planner: >> /* Also need to preprocess expressions for function and values RTEs */ >> foreach(l, parse->rtable) >> { >> RangeTblEntry *rte = (RangeTblEntry *) lfirst(l); >> >> if (rte->rtekind == RTE_FUNCTION) >> rte->funcexpr = preprocess_expression(root, rte->funcexpr, >> EXPRKIND_RTFUNC); >> else if (rte->rtekind == RTE_VALUES) >> rte->values_lists = (List *) >> preprocess_expression(root, (Node *) rte->values_lists, >> EXPRKIND_VALUES); >> } >> >> And this is not done in XC on remote Coordinator. >> >> -- >> Thanks, >> >> Michael Paquier >> http://michael.otacoo.com >> >> >> ------------------------------------------------------------------------------ >> Forrester Wave Report - Recovery time is now measured in hours and minutes >> not days. Key insights are discussed in the 2010 Forrester Wave Report as >> part of an in-depth evaluation of disaster recovery service providers. >> Forrester found the best-in-class provider in terms of services and >> vision. >> Read this report now! http://p.sf.net/sfu/ibm-webcastpromo >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > |
From: Abbas B. <abb...@te...> - 2011-04-13 10:13:06
|
I have observed that when the coordinator receives this query from the client SELECT p.hobbies FROM person p; the query sent to the data nodes is SELECT p.* FROM public.person p which is incorrect. In fact the co-ordinator should have sent the same query to the data nodes. On Wed, Apr 13, 2011 at 1:06 PM, Michael Paquier <mic...@gm...>wrote: > Hi, > > Could it be possible that the origin of this bug is that function > expressions are not preprocessed in current planner of XC? > I found that in Postgres planner: > /* Also need to preprocess expressions for function and values RTEs */ > foreach(l, parse->rtable) > { > RangeTblEntry *rte = (RangeTblEntry *) lfirst(l); > > if (rte->rtekind == RTE_FUNCTION) > rte->funcexpr = preprocess_expression(root, rte->funcexpr, > EXPRKIND_RTFUNC); > else if (rte->rtekind == RTE_VALUES) > rte->values_lists = (List *) > preprocess_expression(root, (Node *) rte->values_lists, > EXPRKIND_VALUES); > } > > And this is not done in XC on remote Coordinator. > > -- > Thanks, > > Michael Paquier > http://michael.otacoo.com > > > ------------------------------------------------------------------------------ > Forrester Wave Report - Recovery time is now measured in hours and minutes > not days. Key insights are discussed in the 2010 Forrester Wave Report as > part of an in-depth evaluation of disaster recovery service providers. > Forrester found the best-in-class provider in terms of services and vision. > Read this report now! http://p.sf.net/sfu/ibm-webcastpromo > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |