You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
|
1
(12) |
2
(2) |
3
(1) |
4
(6) |
5
(11) |
6
(4) |
7
(6) |
8
(5) |
9
|
10
|
11
(1) |
12
(6) |
13
(23) |
14
(5) |
15
(5) |
16
|
17
|
18
(1) |
19
|
20
(2) |
21
(2) |
22
(4) |
23
|
24
|
25
|
26
|
27
(3) |
28
(1) |
29
(2) |
30
|
From: Ashutosh B. <ash...@en...> - 2011-04-13 09:59:18
|
> Yes, custom aggregate function may need different transition and collection > data types. > It is unnecessary complexity for a developer to keep in mind the collection > function may not be invoked and final function should be ready to accept raw > transition value. > Also I think the XC code would be simpler and cleaner if 3-step aggregation > is always used. > examples? (except sum and count, which I think i have fixed in the attached patch). To me collection function is at par transition function if each node contains only one row and transition function is at par collection function when all the data is in same node. So, don't see any need for having collection and transition data types. > > >> If there are some valid cases, your second solution of applying collection >> function before final function makes sense. >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> > > > -- > Best regards, > Andrei Martsinchyk mailto:and...@gm... > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Andrei M. <and...@gm...> - 2011-04-13 09:54:31
|
2011/4/13 Ashutosh Bapat <ash...@en...> > >> >> The transition type and collection type are different for some built-in >> aggregate functions and may be different for user-defined aggregates. If you >> require them the same you can skip calling collection function if there is >> result from only one node. I think this improvement too small to limit >> flexibility. In particular, if aggregation occurs on coordinator only, it is >> not too much overhead to invoke collection function before final function - >> collection function would be trivial in that case, just copy the transition >> value. >> >> > As of now, It's so only for count, sum > postgres=# select * from pg_aggregate where aggtranstype != aggcollecttype; > > aggfnoid | aggtransfn | aggcollectfn | aggfinalfn > | aggsortop | aggtranstype | aggcollecttype | agginitval | agginitcollect > > ------------------+-----------------------+--------------+-----------------+-----------+--------------+----------------+------------+---------------- > pg_catalog.sum | int4_sum | int8_sum | pg_catalog.int8 > | 0 | 20 | 1700 | | > pg_catalog.sum | int2_sum | int8_sum | pg_catalog.int8 > | 0 | 20 | 1700 | | > pg_catalog.count | int8inc_any | int8_sum | pg_catalog.int8 > | 0 | 20 | 1700 | 0 | > pg_catalog.count | int8inc | int8_sum | pg_catalog.int8 > | 0 | 20 | 1700 | 0 | > regr_count | int8inc_float8_float8 | int8_sum | pg_catalog.int8 > | 0 | 20 | 1700 | 0 | > > The count and sum functions really should not need the final function. Both > of them can return the transition or collection data as it is. (well, > barring the artificial datatype conversion it does which can be fixed). > > AFAIU, collection function is same as transition function except that it > applies the same operations on the data gathered from data nodes as the > transition function applies on the rows. Hence both of them having same > return types makes sense. > > Are there any cases, where we have the return types of collection function > and transition function different? (apart from sum and count). > > Yes, custom aggregate function may need different transition and collection data types. It is unnecessary complexity for a developer to keep in mind the collection function may not be invoked and final function should be ready to accept raw transition value. Also I think the XC code would be simpler and cleaner if 3-step aggregation is always used. > If there are some valid cases, your second solution of applying collection > function before final function makes sense. > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Best regards, Andrei Martsinchyk mailto:and...@gm... |
From: Ashutosh B. <ash...@en...> - 2011-04-13 08:52:59
|
> > > > The transition type and collection type are different for some built-in > aggregate functions and may be different for user-defined aggregates. If you > require them the same you can skip calling collection function if there is > result from only one node. I think this improvement too small to limit > flexibility. In particular, if aggregation occurs on coordinator only, it is > not too much overhead to invoke collection function before final function - > collection function would be trivial in that case, just copy the transition > value. > > As of now, It's so only for count, sum postgres=# select * from pg_aggregate where aggtranstype != aggcollecttype; aggfnoid | aggtransfn | aggcollectfn | aggfinalfn | aggsortop | aggtranstype | aggcollecttype | agginitval | agginitcollect ------------------+-----------------------+--------------+-----------------+-----------+--------------+----------------+------------+---------------- pg_catalog.sum | int4_sum | int8_sum | pg_catalog.int8 | 0 | 20 | 1700 | | pg_catalog.sum | int2_sum | int8_sum | pg_catalog.int8 | 0 | 20 | 1700 | | pg_catalog.count | int8inc_any | int8_sum | pg_catalog.int8 | 0 | 20 | 1700 | 0 | pg_catalog.count | int8inc | int8_sum | pg_catalog.int8 | 0 | 20 | 1700 | 0 | regr_count | int8inc_float8_float8 | int8_sum | pg_catalog.int8 | 0 | 20 | 1700 | 0 | The count and sum functions really should not need the final function. Both of them can return the transition or collection data as it is. (well, barring the artificial datatype conversion it does which can be fixed). AFAIU, collection function is same as transition function except that it applies the same operations on the data gathered from data nodes as the transition function applies on the rows. Hence both of them having same return types makes sense. Are there any cases, where we have the return types of collection function and transition function different? (apart from sum and count). If there are some valid cases, your second solution of applying collection function before final function makes sense. -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2011-04-13 08:06:15
|
Hi, Could it be possible that the origin of this bug is that function expressions are not preprocessed in current planner of XC? I found that in Postgres planner: /* Also need to preprocess expressions for function and values RTEs */ foreach(l, parse->rtable) { RangeTblEntry *rte = (RangeTblEntry *) lfirst(l); if (rte->rtekind == RTE_FUNCTION) rte->funcexpr = preprocess_expression(root, rte->funcexpr, EXPRKIND_RTFUNC); else if (rte->rtekind == RTE_VALUES) rte->values_lists = (List *) preprocess_expression(root, (Node *) rte->values_lists, EXPRKIND_VALUES); } And this is not done in XC on remote Coordinator. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-04-13 07:49:04
|
I just found that the parameter list is being assigned there: postquel_sub_params(SQLFunctionCachePtr fcache, FunctionCallInfo fcinfo) based on the value of fcinfo. This information does not look to be set correctly. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Andrei M. <and...@gm...> - 2011-04-13 07:23:40
|
2011/4/13 Ashutosh Bapat <ash...@en...> > > > On Tue, Apr 12, 2011 at 6:42 PM, Andrei Martsinchyk < > and...@gm...> wrote: > >> >> >> 2011/4/12 Ashutosh Bapat <ash...@en...> >> >>> >>> >>> On Tue, Apr 12, 2011 at 5:37 PM, Andrei Martsinchyk < >>> and...@gm...> wrote: >>> >>>> >>>> XC aggregate is implemented by three functions: first function is >>>> invoked on data node once per row and added argument to internal accumulated >>>> value, which is sent to coordinator after group is processed, second >>>> function is invoked on coordinator once per received pre-aggregated row and >>>> combine pre-aggregated values together, third function is invoked on >>>> coordinator per group and convert accumulated value to aggregation result. >>>> Regarding sum() and count(), they have to perform typecast on final >>>> step. >>>> In Postgres sum(int4)int8, accumulating function is defined like >>>> sum_agg(int8, int4):int8; in XC this function performs pre-aggregation. >>>> Combining function is sum_agg(numeric, int8):numeric, Postgres does not have >>>> sum_agg(int8, int8):int8. So XC has to convert numeric to int8 to return >>>> value of declared type, while in Postgres accumulated value can be returned >>>> without conversion. >>>> >>> >>> The plain PG pg_aggregate entries for sum look like >>> postgres=# select * from pg_aggregate where aggfnoid in (select oid from >>> pg_proc where proname = 'sum'); >>> aggfnoid | aggtransfn | aggcollectfn | aggfinalfn | >>> aggsortop | aggtranstype | aggcollecttype | agginitval | agginitcollect >>> >>> ----------------+-------------+--------------+-----------------+-----------+--------------+----------------+------------+---------------- >>> pg_catalog.sum | int8_sum | numeric_add | - | >>> 0 | 1700 | 1700 | | >>> pg_catalog.sum | int4_sum | int8_sum | pg_catalog.int8 | >>> 0 | 20 | 1700 | | >>> pg_catalog.sum | int2_sum | int8_sum | pg_catalog.int8 | >>> 0 | 20 | 1700 | | >>> >>> >>> And the PGXC entries look like >>> testdb=# select * from pg_aggregate where aggfnoid in (select oid from >>> pg_proc where proname = 'sum'); >>> aggfnoid | aggtransfn | aggfinalfn | aggsortop | aggtranstype | >>> agginitval >>> >>> ----------------+-------------+------------+-----------+--------------+------------ >>> pg_catalog.sum | int8_sum | - | 0 | 1700 | >>> pg_catalog.sum | int4_sum | - | 0 | 20 | >>> pg_catalog.sum | int2_sum | - | 0 | 20 | >>> >>> In PG, the sum of integers all result in int8, whereas in PGXC they >>> result into numeric and casted back to int8. May be we should use a new >>> function int8_sum(int8, int8):int8 instead of int8_sum(numeric, >>> int8):numeric. That way we don't need any final function for sum, just like >>> PG. >>> >>> >> Yes, it is possible, I added new functions for some aggregates. But it >> works with existing functions already. >> If something is broken and aggregates do not work as expected the >> workaround will help with sum() and count(), but other aggregates where >> final function is required won't work. >> The root cause should be fixed. >> > > I think, the return type of collection function and transition function > should be same so, that the final function would work in both the cases 1. > final function applied directly over the transition function result - cases > when aggregates can not be pushed to data nodes > > 2. final function is applied over the collection function result - cases > when aggregates are pushed to the data nodes. In first case the aggregates > and group by work the same way as plain PG (applying aggregates row by row) > at coordinator. In second case the aggregates and group by work the XC way > (applying aggregates over the aggregation results of datanodes) at > coordinators. In both the case datanodes won't apply final function but > return the final results of transition functions. > > In all other aggregates except where we see difference between PG and XC, > this method will work since the final function is same in both cases (so > collection function and transition function should have same result types). > Only the differing cases viz. array_agg, string_aggs, count and sum will > need changes in collection function. > > The transition type and collection type are different for some built-in aggregate functions and may be different for user-defined aggregates. If you require them the same you can skip calling collection function if there is result from only one node. I think this improvement too small to limit flexibility. In particular, if aggregation occurs on coordinator only, it is not too much overhead to invoke collection function before final function - collection function would be trivial in that case, just copy the transition value. > >> >>> This will help us set the finalfn_oid in ExecInitAgg() and we will have >>> group by running (albeit slow). This has another impact. The plain >>> aggregates (without any group by) with JOINs are not working currently. For >>> example, query postgres=# select avg(emp.val * dept.val) from emp, dept; >>> returns 0 (1 row with value 0) even if there is some non-zero data in those >>> tables. This is because we do not set finalfn_oid in ExecInitAgg() and the >>> tree for above query looks like >>> AggState(NestedLoop(RemoteQuery (select val from emp), RemoteQuery(select >>> val from dept))). Thus the aggregate is not pushed down to the data node. >>> While finalising the aggregate result, it does not find finalfnoid and thus >>> returns false results. If we can set finalfnoid as done in the attached >>> patch, we will get the group by running albeit suboptimally. >>> >>> >> Aggregates are used to work. Probably something got broken during merge >> with Postgres 9.0.3. >> I have not looked into the latest code, so it is hard to guess what is >> wrong. I will try to find time to take a look. >> >> >>> Any thoughts? >>> >>> I guess in your code one of aggregation steps is missing. >>>> Hope this helps. >>>> >>>> 2011/4/12 Ashutosh Bapat <ash...@en...> >>>> >>>>> Hi, >>>>> I took outputs of query "select aggfnoid, aggfinalfn from pg_aggregate >>>>> where aggfinalfn != 0;" against plain postgres and PGXC. It showed following >>>>> difference >>>>> [ashutosh@anand PG_HEAD]diff /tmp/pgxc_aggfinalfn.out >>>>> /tmp/pg_aggfinalfn.out >>>>> 10,13d9 >>>>> < pg_catalog.sum | pg_catalog.int8 >>>>> < pg_catalog.sum | pg_catalog.int8 >>>>> < pg_catalog.count | pg_catalog.int8 >>>>> < pg_catalog.count | pg_catalog.int8 >>>>> 50d45 >>>>> < regr_count | pg_catalog.int8 >>>>> 62c57,59 >>>>> < (59 rows) >>>>> --- >>>>> > array_agg | array_agg_finalfn >>>>> > string_agg | string_agg_finalfn >>>>> > (56 rows) >>>>> >>>>> >>>>> XC has final functions set for aggregates sum, count whereas plain >>>>> postgres has those. Plain postgres has the final functions for array_agg and >>>>> string_agg but XC does not have those. Why is this difference? >>>>> >>>>> As of now, in XC, for GROUP BY queries, the coordinators receives plain >>>>> data from data nodes, stripped of any aggregates or GROUP BY clause. I was >>>>> trying to use PG mechanism to calculate the aggregates (so as to enable >>>>> group by clauses quickly). It worked for AVG, but for sum it ended up >>>>> calling numeric_int8() because of above entries which hit segfault since the >>>>> data passed to it is not numeric. In that case, it's important to know >>>>> whether those differences are important. NOTE: This won't be the final >>>>> version of GROUP BY support. I am trying to design it such a way that we can >>>>> push GROUP BY down to datanodes. >>>>> >>>>> The changes are added by commit 8326f619. >>>>> >>>>> -- >>>>> Best Wishes, >>>>> Ashutosh Bapat >>>>> EntepriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Forrester Wave Report - Recovery time is now measured in hours and >>>>> minutes >>>>> not days. Key insights are discussed in the 2010 Forrester Wave Report >>>>> as >>>>> part of an in-depth evaluation of disaster recovery service providers. >>>>> Forrester found the best-in-class provider in terms of services and >>>>> vision. >>>>> Read this report now! http://p.sf.net/sfu/ibm-webcastpromo >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Andrei Martsinchyk mailto: >>>> and...@gm... >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >> >> >> -- >> Best regards, >> Andrei Martsinchyk mailto:and...@gm... >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Best regards, Andrei Martsinchyk mailto:and...@gm... |
From: Michael P. <mic...@gm...> - 2011-04-13 07:05:11
|
On Wed, Apr 13, 2011 at 3:58 PM, Andrei Martsinchyk < and...@gm...> wrote: > Probably type Oid is not always converted to name, this should be > investigated. This looks to be the best possibility regarding the origin of the problem. There are places where the type Oid is not converted to the type name, and I would say it is when the function is created or something like this... The second possibility could be related to the fact that XC does twice in the executor. The 1st time for the query SELECT p.hobbies FROM person p; The 2nd time for the SELECT query inside the function. As I think that function treatment is done on local Coordinator, there is something messing up parameters at the 1st step. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-04-13 06:59:54
|
> > That's a tricky thing. typmod is indicator of the type modifier. It's means > different things for different types. For varchar/char it indicates the > length of string that can be store in the column, for numeric it gives the > decimal places etc. In some cases it changes the behaviour of type > completely, for example the typmod of local/run time RECORD types indicates > different record types. But it is what is supplied by the user at the time > of table creation, and thus can not change across nodes, like OID. > Well, in the case of XC, OID is not consistent among nodes. But as you are saying, it looks to be consistent in the cluster. Even if it modified the type, it will do it locally. Does that help? > Thanks, I understand better now. -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Andrei M. <and...@gm...> - 2011-04-13 06:58:35
|
Probably type Oid is not always converted to name, this should be investigated. The type modifier (atttypmod) is not identifier, and indicates additional type-specific info. There may be specivied length of varchar, for example. 2011/4/13 Michael Paquier <mic...@gm...> > On Wed, Apr 13, 2011 at 3:36 PM, Andrei Martsinchyk < > and...@gm...> wrote: > >> Hi Michael, >> >> Off the top of my head, the problem is because of different data types on >> coordinator and data nodes. >> When table is created Postgres creates also a data type representing row >> of that table. The type is identified by Oid and these Oid in general are >> different on coordinator and on data nodes. >> In your case Oid identifying data type was sent between coordinator and >> datanode, I am not sure in which direction, and receiving side tried to look >> up type info by that Oid. >> To solve this issue XC should ensure respective Oids are the same on all >> cluster nodes or translate Oids being sent between the nodes. >> > > We've fixed an issue a couple of weeks ago to be a able to translate Oids > for type names among nodes by exchanging the string and not the Oid between > the nodes. > This has solved a couple of issues related to cache lookup. > > Now in XC we also exchange what is called the type mode (atttypmod in > pg_attribute). > I read in the comments that this is type-specific data supplied at table > creation time. > It is under an integer, but what is it basically? > > May this problem be related to that? Is typmod consistent among nodes? > > -- > Thanks, > > Michael Paquier > http://michael.otacoo.com > -- Best regards, Andrei Martsinchyk mailto:and...@gm... |
From: Ashutosh B. <ash...@en...> - 2011-04-13 06:55:40
|
On Wed, Apr 13, 2011 at 12:19 PM, Michael Paquier <mic...@gm... > wrote: > On Wed, Apr 13, 2011 at 3:36 PM, Andrei Martsinchyk < > and...@gm...> wrote: > >> Hi Michael, >> >> Off the top of my head, the problem is because of different data types on >> coordinator and data nodes. >> When table is created Postgres creates also a data type representing row >> of that table. The type is identified by Oid and these Oid in general are >> different on coordinator and on data nodes. >> In your case Oid identifying data type was sent between coordinator and >> datanode, I am not sure in which direction, and receiving side tried to look >> up type info by that Oid. >> To solve this issue XC should ensure respective Oids are the same on all >> cluster nodes or translate Oids being sent between the nodes. >> > > We've fixed an issue a couple of weeks ago to be a able to translate Oids > for type names among nodes by exchanging the string and not the Oid between > the nodes. > This has solved a couple of issues related to cache lookup. > > Now in XC we also exchange what is called the type mode (atttypmod in > pg_attribute). > I read in the comments that this is type-specific data supplied at table > creation time. > It is under an integer, but what is it basically? > That's a tricky thing. typmod is indicator of the type modifier. It's means different things for different types. For varchar/char it indicates the length of string that can be store in the column, for numeric it gives the decimal places etc. In some cases it changes the behaviour of type completely, for example the typmod of local/run time RECORD types indicates different record types. But it is what is supplied by the user at the time of table creation, and thus can not change across nodes, like OID. Does that help? > > May this problem be related to that? Is typmod consistent among nodes? > > -- > Thanks, > > Michael Paquier > http://michael.otacoo.com > > > ------------------------------------------------------------------------------ > Forrester Wave Report - Recovery time is now measured in hours and minutes > not days. Key insights are discussed in the 2010 Forrester Wave Report as > part of an in-depth evaluation of disaster recovery service providers. > Forrester found the best-in-class provider in terms of services and vision. > Read this report now! http://p.sf.net/sfu/ibm-webcastpromo > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2011-04-13 06:50:11
|
On Tue, Apr 12, 2011 at 6:42 PM, Andrei Martsinchyk < and...@gm...> wrote: > > > 2011/4/12 Ashutosh Bapat <ash...@en...> > >> >> >> On Tue, Apr 12, 2011 at 5:37 PM, Andrei Martsinchyk < >> and...@gm...> wrote: >> >>> >>> XC aggregate is implemented by three functions: first function is invoked >>> on data node once per row and added argument to internal accumulated value, >>> which is sent to coordinator after group is processed, second function is >>> invoked on coordinator once per received pre-aggregated row and combine >>> pre-aggregated values together, third function is invoked on coordinator per >>> group and convert accumulated value to aggregation result. >>> Regarding sum() and count(), they have to perform typecast on final step. >>> In Postgres sum(int4)int8, accumulating function is defined like >>> sum_agg(int8, int4):int8; in XC this function performs pre-aggregation. >>> Combining function is sum_agg(numeric, int8):numeric, Postgres does not have >>> sum_agg(int8, int8):int8. So XC has to convert numeric to int8 to return >>> value of declared type, while in Postgres accumulated value can be returned >>> without conversion. >>> >> >> The plain PG pg_aggregate entries for sum look like >> postgres=# select * from pg_aggregate where aggfnoid in (select oid from >> pg_proc where proname = 'sum'); >> aggfnoid | aggtransfn | aggcollectfn | aggfinalfn | aggsortop >> | aggtranstype | aggcollecttype | agginitval | agginitcollect >> >> ----------------+-------------+--------------+-----------------+-----------+--------------+----------------+------------+---------------- >> pg_catalog.sum | int8_sum | numeric_add | - | 0 >> | 1700 | 1700 | | >> pg_catalog.sum | int4_sum | int8_sum | pg_catalog.int8 | 0 >> | 20 | 1700 | | >> pg_catalog.sum | int2_sum | int8_sum | pg_catalog.int8 | 0 >> | 20 | 1700 | | >> >> >> And the PGXC entries look like >> testdb=# select * from pg_aggregate where aggfnoid in (select oid from >> pg_proc where proname = 'sum'); >> aggfnoid | aggtransfn | aggfinalfn | aggsortop | aggtranstype | >> agginitval >> >> ----------------+-------------+------------+-----------+--------------+------------ >> pg_catalog.sum | int8_sum | - | 0 | 1700 | >> pg_catalog.sum | int4_sum | - | 0 | 20 | >> pg_catalog.sum | int2_sum | - | 0 | 20 | >> >> In PG, the sum of integers all result in int8, whereas in PGXC they result >> into numeric and casted back to int8. May be we should use a new function >> int8_sum(int8, int8):int8 instead of int8_sum(numeric, int8):numeric. That >> way we don't need any final function for sum, just like PG. >> >> > Yes, it is possible, I added new functions for some aggregates. But it > works with existing functions already. > If something is broken and aggregates do not work as expected the > workaround will help with sum() and count(), but other aggregates where > final function is required won't work. > The root cause should be fixed. > I think, the return type of collection function and transition function should be same so, that the final function would work in both the cases 1. final function applied directly over the transition function result - cases when aggregates can not be pushed to data nodes 2. final function is applied over the collection function result - cases when aggregates are pushed to the data nodes. In first case the aggregates and group by work the same way as plain PG (applying aggregates row by row) at coordinator. In second case the aggregates and group by work the XC way (applying aggregates over the aggregation results of datanodes) at coordinators. In both the case datanodes won't apply final function but return the final results of transition functions. In all other aggregates except where we see difference between PG and XC, this method will work since the final function is same in both cases (so collection function and transition function should have same result types). Only the differing cases viz. array_agg, string_aggs, count and sum will need changes in collection function. > > >> This will help us set the finalfn_oid in ExecInitAgg() and we will have >> group by running (albeit slow). This has another impact. The plain >> aggregates (without any group by) with JOINs are not working currently. For >> example, query postgres=# select avg(emp.val * dept.val) from emp, dept; >> returns 0 (1 row with value 0) even if there is some non-zero data in those >> tables. This is because we do not set finalfn_oid in ExecInitAgg() and the >> tree for above query looks like >> AggState(NestedLoop(RemoteQuery (select val from emp), RemoteQuery(select >> val from dept))). Thus the aggregate is not pushed down to the data node. >> While finalising the aggregate result, it does not find finalfnoid and thus >> returns false results. If we can set finalfnoid as done in the attached >> patch, we will get the group by running albeit suboptimally. >> >> > Aggregates are used to work. Probably something got broken during merge > with Postgres 9.0.3. > I have not looked into the latest code, so it is hard to guess what is > wrong. I will try to find time to take a look. > > >> Any thoughts? >> >> I guess in your code one of aggregation steps is missing. >>> Hope this helps. >>> >>> 2011/4/12 Ashutosh Bapat <ash...@en...> >>> >>>> Hi, >>>> I took outputs of query "select aggfnoid, aggfinalfn from pg_aggregate >>>> where aggfinalfn != 0;" against plain postgres and PGXC. It showed following >>>> difference >>>> [ashutosh@anand PG_HEAD]diff /tmp/pgxc_aggfinalfn.out >>>> /tmp/pg_aggfinalfn.out >>>> 10,13d9 >>>> < pg_catalog.sum | pg_catalog.int8 >>>> < pg_catalog.sum | pg_catalog.int8 >>>> < pg_catalog.count | pg_catalog.int8 >>>> < pg_catalog.count | pg_catalog.int8 >>>> 50d45 >>>> < regr_count | pg_catalog.int8 >>>> 62c57,59 >>>> < (59 rows) >>>> --- >>>> > array_agg | array_agg_finalfn >>>> > string_agg | string_agg_finalfn >>>> > (56 rows) >>>> >>>> >>>> XC has final functions set for aggregates sum, count whereas plain >>>> postgres has those. Plain postgres has the final functions for array_agg and >>>> string_agg but XC does not have those. Why is this difference? >>>> >>>> As of now, in XC, for GROUP BY queries, the coordinators receives plain >>>> data from data nodes, stripped of any aggregates or GROUP BY clause. I was >>>> trying to use PG mechanism to calculate the aggregates (so as to enable >>>> group by clauses quickly). It worked for AVG, but for sum it ended up >>>> calling numeric_int8() because of above entries which hit segfault since the >>>> data passed to it is not numeric. In that case, it's important to know >>>> whether those differences are important. NOTE: This won't be the final >>>> version of GROUP BY support. I am trying to design it such a way that we can >>>> push GROUP BY down to datanodes. >>>> >>>> The changes are added by commit 8326f619. >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Forrester Wave Report - Recovery time is now measured in hours and >>>> minutes >>>> not days. Key insights are discussed in the 2010 Forrester Wave Report >>>> as >>>> part of an in-depth evaluation of disaster recovery service providers. >>>> Forrester found the best-in-class provider in terms of services and >>>> vision. >>>> Read this report now! http://p.sf.net/sfu/ibm-webcastpromo >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >>> >>> -- >>> Best regards, >>> Andrei Martsinchyk mailto:and...@gm... >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> > > > -- > Best regards, > Andrei Martsinchyk mailto:and...@gm... > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2011-04-13 06:49:11
|
On Wed, Apr 13, 2011 at 3:36 PM, Andrei Martsinchyk < and...@gm...> wrote: > Hi Michael, > > Off the top of my head, the problem is because of different data types on > coordinator and data nodes. > When table is created Postgres creates also a data type representing row of > that table. The type is identified by Oid and these Oid in general are > different on coordinator and on data nodes. > In your case Oid identifying data type was sent between coordinator and > datanode, I am not sure in which direction, and receiving side tried to look > up type info by that Oid. > To solve this issue XC should ensure respective Oids are the same on all > cluster nodes or translate Oids being sent between the nodes. > We've fixed an issue a couple of weeks ago to be a able to translate Oids for type names among nodes by exchanging the string and not the Oid between the nodes. This has solved a couple of issues related to cache lookup. Now in XC we also exchange what is called the type mode (atttypmod in pg_attribute). I read in the comments that this is type-specific data supplied at table creation time. It is under an integer, but what is it basically? May this problem be related to that? Is typmod consistent among nodes? -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Andrei M. <and...@gm...> - 2011-04-13 06:36:07
|
Hi Michael, Off the top of my head, the problem is because of different data types on coordinator and data nodes. When table is created Postgres creates also a data type representing row of that table. The type is identified by Oid and these Oid in general are different on coordinator and on data nodes. In your case Oid identifying data type was sent between coordinator and datanode, I am not sure in which direction, and receiving side tried to look up type info by that Oid. To solve this issue XC should ensure respective Oids are the same on all cluster nodes or translate Oids being sent between the nodes. 2011/4/13 Michael Paquier <mic...@gm...> > Hi Andrei, > > I am having this test case failing in a strange manner > CREATE TABLE hobbies_r (hobby_name text, person_n text); > CREATE TABLE person (person_name text,age int4,location point); > insert into person values ('foo', 16, '(5.5,2.5)'); > INSERT INTO hobbies_r VALUES ('posthacking', 'mike'); > CREATE FUNCTION hobbies(person) > RETURNS setof hobbies_r > AS 'select * from hobbies_r where person_n = $1.person_name' > LANGUAGE SQL; > SELECT p.hobbies FROM person p; > -- ERROR: cache lookup failed for type 29119504 > -- CONTEXT: SQL function "hobbies" statement 1 > > After digging into the code, I found that the cache lookup is failing when > calling ParamListToDataRow which is in charge of transforming function > parameters (here $1) > to a format of Data row messages to be sent to the Datanodes. > > The problem is occurring after detoasting a Datum value of a varlena type > (here person is a record_out type). > > In my test case, the parameter type of the function is the table "person" > itself. > So is it really adapted to detoast a value of a type like that: > PointerGetDatum(PG_DETOAST_DATUM(param->value)); > for a parameter which is the table? Is there a special process related to > detoasting that permits to treat this case? > I saw that the types Oid are correct, but the parameter value is weird. > Doesn't the detoast of a record_out type has side effects? > > May it be a planner issue? An initialization not done? > > Then I have another question: > Which module is in charge of encoding the parameter type to datarow format? > Is is the query planner or the remote executor? > -- > Thanks, > > Michael Paquier > http://michael.otacoo.com > -- Best regards, Andrei Martsinchyk mailto:and...@gm... |
From: Michael P. <mic...@gm...> - 2011-04-13 06:11:16
|
Hi Andrei, I am having this test case failing in a strange manner CREATE TABLE hobbies_r (hobby_name text, person_n text); CREATE TABLE person (person_name text,age int4,location point); insert into person values ('foo', 16, '(5.5,2.5)'); INSERT INTO hobbies_r VALUES ('posthacking', 'mike'); CREATE FUNCTION hobbies(person) RETURNS setof hobbies_r AS 'select * from hobbies_r where person_n = $1.person_name' LANGUAGE SQL; SELECT p.hobbies FROM person p; -- ERROR: cache lookup failed for type 29119504 -- CONTEXT: SQL function "hobbies" statement 1 After digging into the code, I found that the cache lookup is failing when calling ParamListToDataRow which is in charge of transforming function parameters (here $1) to a format of Data row messages to be sent to the Datanodes. The problem is occurring after detoasting a Datum value of a varlena type (here person is a record_out type). In my test case, the parameter type of the function is the table "person" itself. So is it really adapted to detoast a value of a type like that: PointerGetDatum(PG_DETOAST_DATUM(param->value)); for a parameter which is the table? Is there a special process related to detoasting that permits to treat this case? I saw that the types Oid are correct, but the parameter value is weird. Doesn't the detoast of a record_out type has side effects? May it be a planner issue? An initialization not done? Then I have another question: Which module is in charge of encoding the parameter type to datarow format? Is is the query planner or the remote executor? -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Koichi S. <ko...@in...> - 2011-04-13 01:19:49
|
Perfect! --- Koichi On Wed, 13 Apr 2011 09:30:12 +0900 Michael Paquier <mic...@gm...> wrote: > After User 1 commits, transaction on its side is over. > So it can see the results of the rows inserted by user 2. > > In the case above, I checked that after commit user 1 was getting 4 as > result: > template1=# select count(*) from aa; > count > ------- > 2 > (1 row) > > template1=# commit; > COMMIT > template1=# select count(*) from aa; > count > ------- > 4 > (1 row) > So it looks to work... > -- > Thanks, > > Michael Paquier > http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-04-13 00:30:18
|
After User 1 commits, transaction on its side is over. So it can see the results of the rows inserted by user 2. In the case above, I checked that after commit user 1 was getting 4 as result: template1=# select count(*) from aa; count ------- 2 (1 row) template1=# commit; COMMIT template1=# select count(*) from aa; count ------- 4 (1 row) So it looks to work... -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2011-04-13 00:23:12
|
I'm curious what we have when User1 "commits" after User2 added new rows. Before "commit" User1 got 2 and if serializable works correctly, User1 should get 4. ---------- Koichi Suzuki 2011/4/13 Michael Paquier <mic...@gm...>: > Hi all, > > I just noticed one thing. > It looks that since I added support for session parameters, the isolation > level serializable looks to be supported now in XC. > > Example for read-committed: > Session 1 connected to Coordinator 1: > Session 2 connected to Coordinator 2: > User 1: > begin; > > User 2: > create table aa (a int); > insert into table aa values (1),(2); > > User 1: > select count(*) from aa; > => result = 2 > > User 2: > insert into table aa values (3),(4); > > User 1: > select count(*) from aa; > => result = 4 > commit; > As far everything is normal, default is read committed and read-committed > means that a transaction can see the results of committed transactions when > being run. > > Example for serializable: > User 1: > begin; > set transaction isolation level serializable; > > User 2: > create table aa (a int); > insert into table aa values (1),(2); > > User 1: > select count(*) from aa; > => result = 2 > > User 2: > insert into table aa values (3),(4); > > User 1: > select count(*) from aa; > => result = 2 > commit; > Also correct, with serializable, an open transaction cannot see the results > of transactions that committed after it began. > > Am I missing something, gentlemen? > May there be some snapshot feed issue? > -- > Thanks, > > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Forrester Wave Report - Recovery time is now measured in hours and minutes > not days. Key insights are discussed in the 2010 Forrester Wave Report as > part of an in-depth evaluation of disaster recovery service providers. > Forrester found the best-in-class provider in terms of services and vision. > Read this report now! http://p.sf.net/sfu/ibm-webcastpromo > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Michael P. <mic...@gm...> - 2011-04-12 23:55:19
|
Hi all, I just noticed one thing. It looks that since I added support for session parameters, the isolation level serializable looks to be supported now in XC. Example for read-committed: Session 1 connected to Coordinator 1: Session 2 connected to Coordinator 2: User 1: begin; User 2: create table aa (a int); insert into table aa values (1),(2); User 1: select count(*) from aa; => result = 2 User 2: insert into table aa values (3),(4); User 1: select count(*) from aa; => result = 4 commit; As far everything is normal, default is read committed and read-committed means that a transaction can see the results of committed transactions when being run. Example for serializable: User 1: begin; set transaction isolation level serializable; User 2: create table aa (a int); insert into table aa values (1),(2); User 1: select count(*) from aa; => result = 2 User 2: insert into table aa values (3),(4); User 1: select count(*) from aa; => result = 2 commit; Also correct, with serializable, an open transaction cannot see the results of transactions that committed after it began. Am I missing something, gentlemen? May there be some snapshot feed issue? -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Andrei M. <and...@gm...> - 2011-04-12 13:12:36
|
2011/4/12 Ashutosh Bapat <ash...@en...> > > > On Tue, Apr 12, 2011 at 5:37 PM, Andrei Martsinchyk < > and...@gm...> wrote: > >> >> XC aggregate is implemented by three functions: first function is invoked >> on data node once per row and added argument to internal accumulated value, >> which is sent to coordinator after group is processed, second function is >> invoked on coordinator once per received pre-aggregated row and combine >> pre-aggregated values together, third function is invoked on coordinator per >> group and convert accumulated value to aggregation result. >> Regarding sum() and count(), they have to perform typecast on final step. >> In Postgres sum(int4)int8, accumulating function is defined like >> sum_agg(int8, int4):int8; in XC this function performs pre-aggregation. >> Combining function is sum_agg(numeric, int8):numeric, Postgres does not have >> sum_agg(int8, int8):int8. So XC has to convert numeric to int8 to return >> value of declared type, while in Postgres accumulated value can be returned >> without conversion. >> > > The plain PG pg_aggregate entries for sum look like > postgres=# select * from pg_aggregate where aggfnoid in (select oid from > pg_proc where proname = 'sum'); > aggfnoid | aggtransfn | aggcollectfn | aggfinalfn | aggsortop > | aggtranstype | aggcollecttype | agginitval | agginitcollect > > ----------------+-------------+--------------+-----------------+-----------+--------------+----------------+------------+---------------- > pg_catalog.sum | int8_sum | numeric_add | - | 0 > | 1700 | 1700 | | > pg_catalog.sum | int4_sum | int8_sum | pg_catalog.int8 | 0 > | 20 | 1700 | | > pg_catalog.sum | int2_sum | int8_sum | pg_catalog.int8 | 0 > | 20 | 1700 | | > > > And the PGXC entries look like > testdb=# select * from pg_aggregate where aggfnoid in (select oid from > pg_proc where proname = 'sum'); > aggfnoid | aggtransfn | aggfinalfn | aggsortop | aggtranstype | > agginitval > > ----------------+-------------+------------+-----------+--------------+------------ > pg_catalog.sum | int8_sum | - | 0 | 1700 | > pg_catalog.sum | int4_sum | - | 0 | 20 | > pg_catalog.sum | int2_sum | - | 0 | 20 | > > In PG, the sum of integers all result in int8, whereas in PGXC they result > into numeric and casted back to int8. May be we should use a new function > int8_sum(int8, int8):int8 instead of int8_sum(numeric, int8):numeric. That > way we don't need any final function for sum, just like PG. > > Yes, it is possible, I added new functions for some aggregates. But it works with existing functions already. If something is broken and aggregates do not work as expected the workaround will help with sum() and count(), but other aggregates where final function is required won't work. The root cause should be fixed. > This will help us set the finalfn_oid in ExecInitAgg() and we will have > group by running (albeit slow). This has another impact. The plain > aggregates (without any group by) with JOINs are not working currently. For > example, query postgres=# select avg(emp.val * dept.val) from emp, dept; > returns 0 (1 row with value 0) even if there is some non-zero data in those > tables. This is because we do not set finalfn_oid in ExecInitAgg() and the > tree for above query looks like > AggState(NestedLoop(RemoteQuery (select val from emp), RemoteQuery(select > val from dept))). Thus the aggregate is not pushed down to the data node. > While finalising the aggregate result, it does not find finalfnoid and thus > returns false results. If we can set finalfnoid as done in the attached > patch, we will get the group by running albeit suboptimally. > > Aggregates are used to work. Probably something got broken during merge with Postgres 9.0.3. I have not looked into the latest code, so it is hard to guess what is wrong. I will try to find time to take a look. > Any thoughts? > > I guess in your code one of aggregation steps is missing. >> Hope this helps. >> >> 2011/4/12 Ashutosh Bapat <ash...@en...> >> >>> Hi, >>> I took outputs of query "select aggfnoid, aggfinalfn from pg_aggregate >>> where aggfinalfn != 0;" against plain postgres and PGXC. It showed following >>> difference >>> [ashutosh@anand PG_HEAD]diff /tmp/pgxc_aggfinalfn.out >>> /tmp/pg_aggfinalfn.out >>> 10,13d9 >>> < pg_catalog.sum | pg_catalog.int8 >>> < pg_catalog.sum | pg_catalog.int8 >>> < pg_catalog.count | pg_catalog.int8 >>> < pg_catalog.count | pg_catalog.int8 >>> 50d45 >>> < regr_count | pg_catalog.int8 >>> 62c57,59 >>> < (59 rows) >>> --- >>> > array_agg | array_agg_finalfn >>> > string_agg | string_agg_finalfn >>> > (56 rows) >>> >>> >>> XC has final functions set for aggregates sum, count whereas plain >>> postgres has those. Plain postgres has the final functions for array_agg and >>> string_agg but XC does not have those. Why is this difference? >>> >>> As of now, in XC, for GROUP BY queries, the coordinators receives plain >>> data from data nodes, stripped of any aggregates or GROUP BY clause. I was >>> trying to use PG mechanism to calculate the aggregates (so as to enable >>> group by clauses quickly). It worked for AVG, but for sum it ended up >>> calling numeric_int8() because of above entries which hit segfault since the >>> data passed to it is not numeric. In that case, it's important to know >>> whether those differences are important. NOTE: This won't be the final >>> version of GROUP BY support. I am trying to design it such a way that we can >>> push GROUP BY down to datanodes. >>> >>> The changes are added by commit 8326f619. >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Forrester Wave Report - Recovery time is now measured in hours and >>> minutes >>> not days. Key insights are discussed in the 2010 Forrester Wave Report as >>> part of an in-depth evaluation of disaster recovery service providers. >>> Forrester found the best-in-class provider in terms of services and >>> vision. >>> Read this report now! http://p.sf.net/sfu/ibm-webcastpromo >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> Best regards, >> Andrei Martsinchyk mailto:and...@gm... >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Best regards, Andrei Martsinchyk mailto:and...@gm... |
From: Ashutosh B. <ash...@en...> - 2011-04-12 12:48:26
|
On Tue, Apr 12, 2011 at 5:37 PM, Andrei Martsinchyk < and...@gm...> wrote: > > XC aggregate is implemented by three functions: first function is invoked > on data node once per row and added argument to internal accumulated value, > which is sent to coordinator after group is processed, second function is > invoked on coordinator once per received pre-aggregated row and combine > pre-aggregated values together, third function is invoked on coordinator per > group and convert accumulated value to aggregation result. > Regarding sum() and count(), they have to perform typecast on final step. > In Postgres sum(int4)int8, accumulating function is defined like > sum_agg(int8, int4):int8; in XC this function performs pre-aggregation. > Combining function is sum_agg(numeric, int8):numeric, Postgres does not have > sum_agg(int8, int8):int8. So XC has to convert numeric to int8 to return > value of declared type, while in Postgres accumulated value can be returned > without conversion. > The plain PG pg_aggregate entries for sum look like postgres=# select * from pg_aggregate where aggfnoid in (select oid from pg_proc where proname = 'sum'); aggfnoid | aggtransfn | aggcollectfn | aggfinalfn | aggsortop | aggtranstype | aggcollecttype | agginitval | agginitcollect ----------------+-------------+--------------+-----------------+-----------+--------------+----------------+------------+---------------- pg_catalog.sum | int8_sum | numeric_add | - | 0 | 1700 | 1700 | | pg_catalog.sum | int4_sum | int8_sum | pg_catalog.int8 | 0 | 20 | 1700 | | pg_catalog.sum | int2_sum | int8_sum | pg_catalog.int8 | 0 | 20 | 1700 | | And the PGXC entries look like testdb=# select * from pg_aggregate where aggfnoid in (select oid from pg_proc where proname = 'sum'); aggfnoid | aggtransfn | aggfinalfn | aggsortop | aggtranstype | agginitval ----------------+-------------+------------+-----------+--------------+------------ pg_catalog.sum | int8_sum | - | 0 | 1700 | pg_catalog.sum | int4_sum | - | 0 | 20 | pg_catalog.sum | int2_sum | - | 0 | 20 | In PG, the sum of integers all result in int8, whereas in PGXC they result into numeric and casted back to int8. May be we should use a new function int8_sum(int8, int8):int8 instead of int8_sum(numeric, int8):numeric. That way we don't need any final function for sum, just like PG. This will help us set the finalfn_oid in ExecInitAgg() and we will have group by running (albeit slow). This has another impact. The plain aggregates (without any group by) with JOINs are not working currently. For example, query postgres=# select avg(emp.val * dept.val) from emp, dept; returns 0 (1 row with value 0) even if there is some non-zero data in those tables. This is because we do not set finalfn_oid in ExecInitAgg() and the tree for above query looks like AggState(NestedLoop(RemoteQuery (select val from emp), RemoteQuery(select val from dept))). Thus the aggregate is not pushed down to the data node. While finalising the aggregate result, it does not find finalfnoid and thus returns false results. If we can set finalfnoid as done in the attached patch, we will get the group by running albeit suboptimally. Any thoughts? I guess in your code one of aggregation steps is missing. > Hope this helps. > > 2011/4/12 Ashutosh Bapat <ash...@en...> > >> Hi, >> I took outputs of query "select aggfnoid, aggfinalfn from pg_aggregate >> where aggfinalfn != 0;" against plain postgres and PGXC. It showed following >> difference >> [ashutosh@anand PG_HEAD]diff /tmp/pgxc_aggfinalfn.out >> /tmp/pg_aggfinalfn.out >> 10,13d9 >> < pg_catalog.sum | pg_catalog.int8 >> < pg_catalog.sum | pg_catalog.int8 >> < pg_catalog.count | pg_catalog.int8 >> < pg_catalog.count | pg_catalog.int8 >> 50d45 >> < regr_count | pg_catalog.int8 >> 62c57,59 >> < (59 rows) >> --- >> > array_agg | array_agg_finalfn >> > string_agg | string_agg_finalfn >> > (56 rows) >> >> >> XC has final functions set for aggregates sum, count whereas plain >> postgres has those. Plain postgres has the final functions for array_agg and >> string_agg but XC does not have those. Why is this difference? >> >> As of now, in XC, for GROUP BY queries, the coordinators receives plain >> data from data nodes, stripped of any aggregates or GROUP BY clause. I was >> trying to use PG mechanism to calculate the aggregates (so as to enable >> group by clauses quickly). It worked for AVG, but for sum it ended up >> calling numeric_int8() because of above entries which hit segfault since the >> data passed to it is not numeric. In that case, it's important to know >> whether those differences are important. NOTE: This won't be the final >> version of GROUP BY support. I am trying to design it such a way that we can >> push GROUP BY down to datanodes. >> >> The changes are added by commit 8326f619. >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> >> ------------------------------------------------------------------------------ >> Forrester Wave Report - Recovery time is now measured in hours and minutes >> not days. Key insights are discussed in the 2010 Forrester Wave Report as >> part of an in-depth evaluation of disaster recovery service providers. >> Forrester found the best-in-class provider in terms of services and >> vision. >> Read this report now! http://p.sf.net/sfu/ibm-webcastpromo >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Best regards, > Andrei Martsinchyk mailto:and...@gm... > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Andrei M. <and...@gm...> - 2011-04-12 12:07:19
|
Hi Ashutosh, In XC aggregates were changed to perform pre-aggregation on data nodes, that was done to reduce amount of data sent over network. If aggregation happened on coordinator each data node would return entire result set, but instead it returns one row (one row per group, if GROUP BY exists). Naturally, definitions of aggregate functions in XC were changed. Postgres aggregate is implemented by two functions: first function is invoked once per row and added argument to internal accumulated value, second function is invoked once per group and convert accumulated value to aggregation result. XC aggregate is implemented by three functions: first function is invoked on data node once per row and added argument to internal accumulated value, which is sent to coordinator after group is processed, second function is invoked on coordinator once per received pre-aggregated row and combine pre-aggregated values together, third function is invoked on coordinator per group and convert accumulated value to aggregation result. Regarding sum() and count(), they have to perform typecast on final step. In Postgres sum(int4)int8, accumulating function is defined like sum_agg(int8, int4):int8; in XC this function performs pre-aggregation. Combining function is sum_agg(numeric, int8):numeric, Postgres does not have sum_agg(int8, int8):int8. So XC has to convert numeric to int8 to return value of declared type, while in Postgres accumulated value can be returned without conversion. I guess in your code one of aggregation steps is missing. Hope this helps. 2011/4/12 Ashutosh Bapat <ash...@en...> > Hi, > I took outputs of query "select aggfnoid, aggfinalfn from pg_aggregate > where aggfinalfn != 0;" against plain postgres and PGXC. It showed following > difference > [ashutosh@anand PG_HEAD]diff /tmp/pgxc_aggfinalfn.out > /tmp/pg_aggfinalfn.out > 10,13d9 > < pg_catalog.sum | pg_catalog.int8 > < pg_catalog.sum | pg_catalog.int8 > < pg_catalog.count | pg_catalog.int8 > < pg_catalog.count | pg_catalog.int8 > 50d45 > < regr_count | pg_catalog.int8 > 62c57,59 > < (59 rows) > --- > > array_agg | array_agg_finalfn > > string_agg | string_agg_finalfn > > (56 rows) > > > XC has final functions set for aggregates sum, count whereas plain postgres > has those. Plain postgres has the final functions for array_agg and > string_agg but XC does not have those. Why is this difference? > > As of now, in XC, for GROUP BY queries, the coordinators receives plain > data from data nodes, stripped of any aggregates or GROUP BY clause. I was > trying to use PG mechanism to calculate the aggregates (so as to enable > group by clauses quickly). It worked for AVG, but for sum it ended up > calling numeric_int8() because of above entries which hit segfault since the > data passed to it is not numeric. In that case, it's important to know > whether those differences are important. NOTE: This won't be the final > version of GROUP BY support. I am trying to design it such a way that we can > push GROUP BY down to datanodes. > > The changes are added by commit 8326f619. > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > ------------------------------------------------------------------------------ > Forrester Wave Report - Recovery time is now measured in hours and minutes > not days. Key insights are discussed in the 2010 Forrester Wave Report as > part of an in-depth evaluation of disaster recovery service providers. > Forrester found the best-in-class provider in terms of services and vision. > Read this report now! http://p.sf.net/sfu/ibm-webcastpromo > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best regards, Andrei Martsinchyk mailto:and...@gm... |
From: Ashutosh B. <ash...@en...> - 2011-04-12 10:54:20
|
Hi, I took outputs of query "select aggfnoid, aggfinalfn from pg_aggregate where aggfinalfn != 0;" against plain postgres and PGXC. It showed following difference [ashutosh@anand PG_HEAD]diff /tmp/pgxc_aggfinalfn.out /tmp/pg_aggfinalfn.out 10,13d9 < pg_catalog.sum | pg_catalog.int8 < pg_catalog.sum | pg_catalog.int8 < pg_catalog.count | pg_catalog.int8 < pg_catalog.count | pg_catalog.int8 50d45 < regr_count | pg_catalog.int8 62c57,59 < (59 rows) --- > array_agg | array_agg_finalfn > string_agg | string_agg_finalfn > (56 rows) XC has final functions set for aggregates sum, count whereas plain postgres has those. Plain postgres has the final functions for array_agg and string_agg but XC does not have those. Why is this difference? As of now, in XC, for GROUP BY queries, the coordinators receives plain data from data nodes, stripped of any aggregates or GROUP BY clause. I was trying to use PG mechanism to calculate the aggregates (so as to enable group by clauses quickly). It worked for AVG, but for sum it ended up calling numeric_int8() because of above entries which hit segfault since the data passed to it is not numeric. In that case, it's important to know whether those differences are important. NOTE: This won't be the final version of GROUP BY support. I am trying to design it such a way that we can push GROUP BY down to datanodes. The changes are added by commit 8326f619. -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2011-04-11 23:15:21
|
Hi all, Postgres-XC 0.9.4 was available in the GIT repository since last week, but now the tar file and all the manuals have been released there: https://sourceforge.net/projects/postgres-xc/files/Version_0.9.4/ -- Thanks, Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2011-04-08 08:26:45
|
Year. I believe 1st cut is very important to make the solution general. We used to take the second cut first and this influenced the cover range of the statment. (It was very useful to run DBT-1 though). Regards; ---------- Koichi Suzuki 2011/4/8 Ashutosh Bapat <ash...@en...>: > Before I forget, Koichi, your sourceforge address bounced, so my mail only > reached Mason. Thanks Mason for including others in the thread again. > > On Fri, Apr 8, 2011 at 6:04 AM, Koichi Suzuki <koi...@gm...> wrote: >> >> Thank you for valuable advice. We also should think how GROUP BY can >> be pushed down to datanodes so that coordinator can simply merge the >> result. > > If I understand it right, Mason has already given solution to that problem. > We push the groupby down to datanodes with additional order by clause > (ordering based on the group by expressions) on top of them (this looks > tricky if there are already other order by clauses). Thus the data we > collect at coordinator is already grouped per datanode and all coordinator > has to do is to consolidate each row from the data nodes in order like we do > in merge sort. > > I see that we may have to do this in steps. > 1. first cut - implement to apply group by only at coordinator. Not so > efficient, but will make group by work > 2. second cut - implement pushing down group by to the data nodes, better > than one but still the grouping at coordinator is not that efficient > 3. third cut - implement above idea fully > > We might be able to do 1 and 2 in the first cut itself. But this is too > early to say anything. I will get back once, I know things better. > > Mason has also pointed to the possibility of distributing grouping phase at > coordinator across datanodes (in third cut) so that coordinator is not > loaded if there are too many columns in the group by. But that requires the > infrastructure to ship rows from coordinator to datanodes. This > infrastructure is not place, I think. So that is a far possibility for now. > > >> >> ---------- >> Koichi Suzuki >> >> >> >> 2011/4/7 Mason <ma...@us...>: >> > I looked at the schedule. >> > >> > I am not sure about the planned design for GROUP BY, but originally >> > Andrei was planning on making it somewhat similar to ORDER BY, how >> > ORDER BY does a merge sort on the coordinator, based on sorted results >> > from the data nodes. Each data node could do the beginning phase of >> > aggregation in groups and then sort the output in the same manner of >> > the groups. Then, the coordinator could do the last step of >> > aggregation with like groups, which is easy to get them on the fly >> > because of the sorting coming in from the data nodes (and avoiding >> > materialization). >> > >> > This should work pretty well. One drawback is if they chose a GROUP >> > BY clause with many groups (many = thousands+). Then some parallelism >> > is lost because of the final phase being done in only one place, on >> > the Coordinator. GridSQL spreads out the final aggregation phase >> > amongst all the data nodes, moving like groups to the same node to get >> > more parallelism. I think row shipping infrastructure might have to be >> > in place first before implementing that, and there will be a >> > noticeable benefit only once there are many, many groups, so I don't >> > see it being a critical thing at this phase and can be added later. >> > >> > Regards, >> > >> > Mason >> > >> > >> > >> > On Thu, Apr 7, 2011 at 5:27 AM, Koichi Suzuki >> > <koi...@us...> wrote: >> >> Project "Postgres-XC documentation". >> >> >> >> The branch, master has been updated >> >> via 62434399fdd57aff2701e3e5e97fed619f6d6820 (commit) >> >> from 252519c2be5309a3682b0ee895cf040083ae1784 (commit) >> >> >> >> >> >> - Log ----------------------------------------------------------------- >> >> commit 62434399fdd57aff2701e3e5e97fed619f6d6820 >> >> Author: Koichi Suzuki <koi...@gm...> >> >> Date: Thu Apr 7 18:27:26 2011 +0900 >> >> >> >> 1. Added 2011FYQ1 schedule for each member. >> >> 2. Modified my progress sheet of Reference Manual. >> >> >> >> -- Koichi Suzuki >> >> >> >> diff --git a/progress/2011FYQ1_Schedule.ods >> >> b/progress/2011FYQ1_Schedule.ods >> >> new file mode 100755 >> >> index 0000000..5e24d37 >> >> Binary files /dev/null and b/progress/2011FYQ1_Schedule.ods differ >> >> diff --git a/progress/documentation-progress.ods >> >> b/progress/documentation-progress.ods >> >> index 277aade..2c8577e 100644 >> >> Binary files a/progress/documentation-progress.ods and >> >> b/progress/documentation-progress.ods differ >> >> >> >> ----------------------------------------------------------------------- >> >> >> >> Summary of changes: >> >> progress/2011FYQ1_Schedule.ods | Bin 0 -> 22147 bytes >> >> progress/documentation-progress.ods | Bin 16883 -> 19519 bytes >> >> 2 files changed, 0 insertions(+), 0 deletions(-) >> >> create mode 100755 progress/2011FYQ1_Schedule.ods >> >> >> >> >> >> hooks/post-receive >> >> -- >> >> Postgres-XC documentation >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Xperia(TM) PLAY >> >> It's a major breakthrough. An authentic gaming >> >> smartphone on the nation's most reliable network. >> >> And it wants your games. >> >> http://p.sf.net/sfu/verizon-sfdev >> >> _______________________________________________ >> >> Postgres-xc-committers mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers >> >> >> > >> > >> > ------------------------------------------------------------------------------ >> > Xperia(TM) PLAY >> > It's a major breakthrough. An authentic gaming >> > smartphone on the nation's most reliable network. >> > And it wants your games. >> > http://p.sf.net/sfu/verizon-sfdev >> > _______________________________________________ >> > Postgres-xc-committers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers >> > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > |
From: Andrei M. <and...@gm...> - 2011-04-08 07:28:21
|
Hi, Actually the ORDER BY is not always has to be pushed down to data nodes. Postgres may decide to group by hash. In this case sorting is unnecessary operation. However it may be a problem to determine on coordinator, if data node is going to use sort or hash grouping. Aggregate functions are already changed so values are pre-aggregated on datanodes, and coordinator completes aggregation. This should not be a problem. 2011/4/8 Ashutosh Bapat <ash...@en...> > Before I forget, Koichi, your sourceforge address bounced, so my mail only > reached Mason. Thanks Mason for including others in the thread again. > > On Fri, Apr 8, 2011 at 6:04 AM, Koichi Suzuki <koi...@gm...>wrote: > >> Thank you for valuable advice. We also should think how GROUP BY can >> be pushed down to datanodes so that coordinator can simply merge the >> result. >> > > If I understand it right, Mason has already given solution to that problem. > We push the groupby down to datanodes with additional order by clause > (ordering based on the group by expressions) on top of them (this looks > tricky if there are already other order by clauses). Thus the data we > collect at coordinator is already grouped per datanode and all coordinator > has to do is to consolidate each row from the data nodes in order like we do > in merge sort. > > I see that we may have to do this in steps. > 1. first cut - implement to apply group by only at coordinator. Not so > efficient, but will make group by work > 2. second cut - implement pushing down group by to the data nodes, better > than one but still the grouping at coordinator is not that efficient > 3. third cut - implement above idea fully > > We might be able to do 1 and 2 in the first cut itself. But this is too > early to say anything. I will get back once, I know things better. > > Mason has also pointed to the possibility of distributing grouping phase at > coordinator across datanodes (in third cut) so that coordinator is not > loaded if there are too many columns in the group by. But that requires the > infrastructure to ship rows from coordinator to datanodes. This > infrastructure is not place, I think. So that is a far possibility for now. > > > >> >> ---------- >> Koichi Suzuki >> >> >> >> 2011/4/7 Mason <ma...@us...>: >> > I looked at the schedule. >> > >> > I am not sure about the planned design for GROUP BY, but originally >> > Andrei was planning on making it somewhat similar to ORDER BY, how >> > ORDER BY does a merge sort on the coordinator, based on sorted results >> > from the data nodes. Each data node could do the beginning phase of >> > aggregation in groups and then sort the output in the same manner of >> > the groups. Then, the coordinator could do the last step of >> > aggregation with like groups, which is easy to get them on the fly >> > because of the sorting coming in from the data nodes (and avoiding >> > materialization). >> > >> > This should work pretty well. One drawback is if they chose a GROUP >> > BY clause with many groups (many = thousands+). Then some parallelism >> > is lost because of the final phase being done in only one place, on >> > the Coordinator. GridSQL spreads out the final aggregation phase >> > amongst all the data nodes, moving like groups to the same node to get >> > more parallelism. I think row shipping infrastructure might have to be >> > in place first before implementing that, and there will be a >> > noticeable benefit only once there are many, many groups, so I don't >> > see it being a critical thing at this phase and can be added later. >> > >> > Regards, >> > >> > Mason >> > >> > >> > >> > On Thu, Apr 7, 2011 at 5:27 AM, Koichi Suzuki >> > <koi...@us...> wrote: >> >> Project "Postgres-XC documentation". >> >> >> >> The branch, master has been updated >> >> via 62434399fdd57aff2701e3e5e97fed619f6d6820 (commit) >> >> from 252519c2be5309a3682b0ee895cf040083ae1784 (commit) >> >> >> >> >> >> - Log ----------------------------------------------------------------- >> >> commit 62434399fdd57aff2701e3e5e97fed619f6d6820 >> >> Author: Koichi Suzuki <koi...@gm...> >> >> Date: Thu Apr 7 18:27:26 2011 +0900 >> >> >> >> 1. Added 2011FYQ1 schedule for each member. >> >> 2. Modified my progress sheet of Reference Manual. >> >> >> >> -- Koichi Suzuki >> >> >> >> diff --git a/progress/2011FYQ1_Schedule.ods >> b/progress/2011FYQ1_Schedule.ods >> >> new file mode 100755 >> >> index 0000000..5e24d37 >> >> Binary files /dev/null and b/progress/2011FYQ1_Schedule.ods differ >> >> diff --git a/progress/documentation-progress.ods >> b/progress/documentation-progress.ods >> >> index 277aade..2c8577e 100644 >> >> Binary files a/progress/documentation-progress.ods and >> b/progress/documentation-progress.ods differ >> >> >> >> ----------------------------------------------------------------------- >> >> >> >> Summary of changes: >> >> progress/2011FYQ1_Schedule.ods | Bin 0 -> 22147 bytes >> >> progress/documentation-progress.ods | Bin 16883 -> 19519 bytes >> >> 2 files changed, 0 insertions(+), 0 deletions(-) >> >> create mode 100755 progress/2011FYQ1_Schedule.ods >> >> >> >> >> >> hooks/post-receive >> >> -- >> >> Postgres-XC documentation >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Xperia(TM) PLAY >> >> It's a major breakthrough. An authentic gaming >> >> smartphone on the nation's most reliable network. >> >> And it wants your games. >> >> http://p.sf.net/sfu/verizon-sfdev >> >> _______________________________________________ >> >> Postgres-xc-committers mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers >> >> >> > >> > >> ------------------------------------------------------------------------------ >> > Xperia(TM) PLAY >> > It's a major breakthrough. An authentic gaming >> > smartphone on the nation's most reliable network. >> > And it wants your games. >> > http://p.sf.net/sfu/verizon-sfdev >> > _______________________________________________ >> > Postgres-xc-committers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers >> > >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > ------------------------------------------------------------------------------ > Xperia(TM) PLAY > It's a major breakthrough. An authentic gaming > smartphone on the nation's most reliable network. > And it wants your games. > http://p.sf.net/sfu/verizon-sfdev > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best regards, Andrei Martsinchyk mailto:and...@gm... |