You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
1
|
2
(1) |
3
(6) |
4
(19) |
5
|
6
(15) |
7
(2) |
8
(2) |
9
(22) |
10
(20) |
11
(20) |
12
(14) |
13
(12) |
14
(2) |
15
|
16
(14) |
17
(17) |
18
(4) |
19
(8) |
20
(2) |
21
(3) |
22
|
23
(8) |
24
(1) |
25
|
26
(2) |
27
(1) |
28
|
29
|
30
(7) |
31
(3) |
|
|
|
|
From: Nikhil S. <ni...@st...> - 2012-07-10 18:46:30
|
Oh, but on second thoughts, the PG-XC community will definitely be interested in such a functionality. One way of doing this could be to pass in a new option to pgbench (-L) with a comma separated list of host:port values like ip1:5678,ip2,ip3:3333 etc. If no port is specified, it can default to 5432. This can then be put into a list and the same logic discussed below can be applied to access them. Just might work. Other ideas are welcome too. I don't know if pgbench is THE benchmark tool for XC though :) Regards, Nikhils On Tue, Jul 10, 2012 at 10:15 AM, Nikhil Sontakke <ni...@st...> wrote: > Hi Shankar, > > It is probably fastest if you hack it up for your own use for now. I > would just keep the list of my coordinators in an array. Then in the > doConnect() function I would use a rand/srand call and then just mod > it with the number of servers to get the index into this array. I will > then use this to get the host/port info. The normal PG community would > not be interested in such type of a functionality anyways. > > Regards, > Nikhils > > On Tue, Jul 10, 2012 at 9:49 AM, Shankar Hariharan > <har...@ya...> wrote: >> If no one else is looking at this I can definitely pick it up. Pls let me >> know. >> >> thanks, >> Shankar >> >> ________________________________ >> From: Ashutosh Bapat <ash...@en...> >> To: Nikhil Sontakke <ni...@st...> >> Cc: Koichi Suzuki <koi...@gm...>; Shankar Hariharan >> <har...@ya...>; >> "pos...@li..." >> <pos...@li...> >> Sent: Tuesday, July 10, 2012 7:24 AM >> >> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >> >> >> On Tue, Jul 10, 2012 at 5:46 PM, Nikhil Sontakke <ni...@st...> >> wrote: >> >>> Yes. Although we don't have to care application partitioning based >>> upon the distribution key, it's a good idea to make all the >>> coordinator workload as even as possible. >>> >>> In the case of DBT-1, we ran several DBT-1 process, each produces >>> random transaction but goes to specific coordinator. >>> >>> I think pgbench can do the similar. >>> >> >> Well, a quick look at pgbench.c suggests that changing the doConnect() >> function to pick up a random pghost and pgport set whenever it's >> called should be enough to get this going. >> >> >> That's good. May be we can pick those in round robin fashion to get >> deterministic results. >> >> >> >> Regards, >> Nikhils >> >>> Regards; >>> ---------- >>> Koichi Suzuki >>> >>> >>> 2012/7/10 Ashutosh Bapat <ash...@en...>: >>>> Hi Shankar, >>>> Will it be possible for you to change the pgbench code to dynamically >>>> fire >>>> on all available coordinators? >>>> >>>> Since we use modified DBT-1 for our benchmarking, we haven't got to the >>>> point where we can modify pg_bench to suite XC. But that's something, we >>>> will welcome if anybody is interested. >>>> >>>> >>>> On Mon, Jul 9, 2012 at 9:41 PM, Shankar Hariharan >>>> <har...@ya...> wrote: >>>>> >>>>> Thanks Ashutosh. You are right, while running this test i just had >>>>> pgbench >>>>> running against one coordinator. Looks like pgbench by itself may not be >>>>> an >>>>> apt tool for this kind of testing, I will instead run pgbench's >>>>> underlying >>>>> sql script from cmdline against either coordinators. Thanks for that >>>>> tip. >>>>> >>>>> I got a lot of input on my problem from a lot of folks on the list, the >>>>> feedback is much appreciated. Thanks everybody! >>>>> >>>>> On max_prepared_transactions, I will factor in the number of >>>>> coordinators >>>>> and the max_connections on each coordinator while arriving at a figure. >>>>> Will also try out Koichi Suzuki's suggestion to have multiple NICs on >>>>> the >>>>> GTM. I will post my findings here for the same cluster configuration as >>>>> before. >>>>> >>>>> thanks, >>>>> Shankar >>>>> >>>>> ________________________________ >>>>> From: Ashutosh Bapat <ash...@en...> >>>>> To: Shankar Hariharan <har...@ya...> >>>>> Cc: "pos...@li..." >>>>> <pos...@li...> >>>>> Sent: Sunday, July 8, 2012 11:02 PM >>>>> >>>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>>> >>>>> Hi Shankar, >>>>> You have got answers to the prepared transaction problem, I guess. I >>>>> have >>>>> something else below. >>>>> >>>>> On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan >>>>> <har...@ya...> wrote: >>>>> >>>>> As planned I ran some tests using PGBench on this setup : >>>>> >>>>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>>>> Node 2- Coord2, Datanode2, gtm-proxy2 >>>>> Node 3- Datanode3, gtm >>>>> >>>>> I was connecting via Coord1 for these tests: >>>>> - scale factor of 30 used >>>>> - tests run using the following input parameters for pgbench: >>>>> >>>>> >>>>> Try connecting to both the coordinators, it should give you better >>>>> performance, esp, when you are using distributed tables. With >>>>> distributed >>>>> tables, coordinator gets involved in query execution more than that in >>>>> the >>>>> case of replicated tables. So, balancing load across two coordinators >>>>> would >>>>> help. >>>>> >>>>> >>>>> >>>>> Clients Threads Duration Transactions >>>>> 1 1 100 6204 >>>>> 2 2 100 9960 >>>>> 4 4 100 12880 >>>>> 6 6 100 1676 >>>>> >>>>> >>>>> >>>>> 8 >>>>> 8 8 100 19758 >>>>> 10 10 100 21944 >>>>> 12 12 100 20674 >>>>> >>>>> The run went well until the 8 clients. I started seeing errors on 10 >>>>> clients onwards and eventually the 14 client run has been hanging around >>>>> for >>>>> over an hour now. The errors I have been seeing on console are the >>>>> following >>>>> : >>>>> >>>>> pgbench console : >>>>> Client 8 aborted in state 12: ERROR: GTM error, could not obtain >>>>> snapshot >>>>> Client 0 aborted in state 13: ERROR: maximum number of prepared >>>>> transactions reached >>>>> Client 7 aborted in state 13: ERROR: maximum number of prepared >>>>> transactions reached >>>>> Client 11 aborted in state 13: ERROR: maximum number of prepared >>>>> transactions reached >>>>> Client 9 aborted in state 13: ERROR: maximum number of prepared >>>>> transactions reached >>>>> >>>>> node console: >>>>> ERROR: GTM error, could not obtain snapshot >>>>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >>>>> VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); >>>>> ERROR: maximum number of prepared transactions reached >>>>> HINT: Increase max_prepared_transactions (currently 10). >>>>> STATEMENT: PREPARE TRANSACTION 'T201428' >>>>> ERROR: maximum number of prepared transactions reached >>>>> STATEMENT: END; >>>>> ERROR: maximum number of prepared transactions reached >>>>> STATEMENT: END; >>>>> ERROR: maximum number of prepared transactions reached >>>>> STATEMENT: END; >>>>> ERROR: maximum number of prepared transactions reached >>>>> STATEMENT: END; >>>>> ERROR: GTM error, could not obtain snapshot >>>>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >>>>> VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); >>>>> >>>>> I was also watching the processes on each node and see the following for >>>>> the 14 client run: >>>>> >>>>> >>>>> Node1 : >>>>> postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres >>>>> postgres ::1(33481) TRUNCATE TABLE waiting >>>>> postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres >>>>> postgres pgbench-address (50388) TRUNCATE TABLE >>>>> >>>>> Node2: >>>>> postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres >>>>> postgres coord1-address(57357) idle in transaction >>>>> >>>>> Node3: >>>>> postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres >>>>> postgres coord1-address(51406) TRUNCATE TABLE waiting >>>>> >>>>> >>>>> I was going to restart the processes on all nodes and start over but did >>>>> not want to lose this data as it could be useful information. >>>>> >>>>> Any explanation on the above issue is much appreciated. I will try the >>>>> next run with a higher value set for max_prepared_transactions. Any >>>>> recommendations for a good value on this front? >>>>> >>>>> thanks, >>>>> Shankar >>>>> >>>>> >>>>> ________________________________ >>>>> From: Shankar Hariharan <har...@ya...> >>>>> To: Ashutosh Bapat <ash...@en...> >>>>> Cc: "pos...@li..." >>>>> <pos...@li...> >>>>> Sent: Friday, July 6, 2012 8:22 AM >>>>> >>>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>>> >>>>> Hi Ashutosh, >>>>> I was trying to size the load on a server and was wondering if a GTM >>>>> could be shared w/o much performance overhead between a small number of >>>>> datanodes and coordinators. I will post my findings here. >>>>> thanks, >>>>> Shankar >>>>> >>>>> ________________________________ >>>>> From: Ashutosh Bapat <ash...@en...> >>>>> To: Shankar Hariharan <har...@ya...> >>>>> Cc: "pos...@li..." >>>>> <pos...@li...> >>>>> Sent: Friday, July 6, 2012 12:25 AM >>>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>>> >>>>> Hi Shankar, >>>>> Running gtm-proxy has shown to improve the performance, because it >>>>> lessens >>>>> the load on GTM, by serving requests locally. Why do you want the >>>>> coordinators to connect directly to the GTM? Are you seeing any >>>>> performance >>>>> improvement from doing that? >>>>> >>>>> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >>>>> <har...@ya...> wrote: >>>>> >>>>> Follow up to earlier email. In the setup described below, can I avoid >>>>> using a gtm-proxy? That is, can I just simply point coordinators to the >>>>> one >>>>> gtm running on node 3 ? >>>>> My initial plan was to just run the gtm on node 3 then I thought I could >>>>> try a datanode without a local coordinator which was why I put these two >>>>> together on node 3. >>>>> thanks, >>>>> Shankar >>>>> >>>>> ________________________________ >>>>> From: Shankar Hariharan <har...@ya...> >>>>> To: "pos...@li..." >>>>> <pos...@li...> >>>>> Sent: Thursday, July 5, 2012 11:35 PM >>>>> Subject: Question on multiple coordinators >>>>> >>>>> Hello, >>>>> >>>>> Am trying out XC 1.0 in the following configuraiton. >>>>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>>>> Node 2- Coord2, Datanode2, gtm-proxy2 >>>>> Node 3- Datanode3, gtm >>>>> >>>>> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >>>>> addition I missed the pg_hba edit as well. So the first table T1 that I >>>>> created for distribution from Coord1 was not "visible| from Coord2 but >>>>> was >>>>> on all the data nodes. >>>>> I tried to get Coord2 backinto business in various ways but the first >>>>> table I created refused to show up on Coord2 : >>>>> - edit pg_hba and add node on both coord1 and 2. Then run select >>>>> pgxc_pool_reload(); >>>>> - restart coord 1 and 2 >>>>> - drop node c2 from c1 and c1 from c2 and add them back followed by >>>>> select >>>>> pgxc_pool_reload(); >>>>> >>>>> So I tried to create the same table T1 from Coord2 to observe behavior >>>>> and >>>>> it did not like it clearly as all nodes it "wrote" to reported that the >>>>> table already existed which was good. At this point I could understand >>>>> that >>>>> Coord2 and Coord1 are not talking alright so I created a new table from >>>>> coord1 with replication. This table was visible from both now. >>>>> >>>>> Question is should I expect to see the first table, let me call it T1 >>>>> after a while from Coord2 also? >>>>> >>>>> >>>>> thanks, >>>>> Shankar >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. >>>>> Discussions >>>>> will include endpoint security, mobile security and the latest in >>>>> malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Wishes, >>>>> Ashutosh Bapat >>>>> EntepriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Wishes, >>>>> Ashutosh Bapat >>>>> EntepriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud >> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud -- StormDB - http://www.stormdb.com The Database Cloud |
From: Shankar H. <har...@ya...> - 2012-07-10 14:18:12
|
Hi Koichi, That is what pgbench is doing, going to just one coordinator on my cluster. I did set max_prepared_transactions to an appropriate value and could see a performance bump but I did run into errors indicated below and eventually the pgbench run locked up for 14 clients. Would be good to understand the cause of this error. 10 clients: Client 9 aborted in state 12: ERROR: GTM error, could not obtain snapshot 12 clients: Client 11 aborted in state 11: ERROR: GTM error, could not obtain snapshot Client 8 aborted in state 11: ERROR: GTM error, could not obtain snapshot thanks, Shankar ________________________________ From: Koichi Suzuki <koi...@gm...> To: Ashutosh Bapat <ash...@en...> Cc: Shankar Hariharan <har...@ya...>; "pos...@li..." <pos...@li...> Sent: Monday, July 9, 2012 11:51 PM Subject: Re: [Postgres-xc-developers] Question on gtm-proxy Yes. Although we don't have to care application partitioning based upon the distribution key, it's a good idea to make all the coordinator workload as even as possible. In the case of DBT-1, we ran several DBT-1 process, each produces random transaction but goes to specific coordinator. I think pgbench can do the similar. Regards; ---------- Koichi Suzuki 2012/7/10 Ashutosh Bapat <ash...@en...>: > Hi Shankar, > Will it be possible for you to change the pgbench code to dynamically fire > on all available coordinators? > > Since we use modified DBT-1 for our benchmarking, we haven't got to the > point where we can modify pg_bench to suite XC. But that's something, we > will welcome if anybody is interested. > > > On Mon, Jul 9, 2012 at 9:41 PM, Shankar Hariharan > <har...@ya...> wrote: >> >> Thanks Ashutosh. You are right, while running this test i just had pgbench >> running against one coordinator. Looks like pgbench by itself may not be an >> apt tool for this kind of testing, I will instead run pgbench's underlying >> sql script from cmdline against either coordinators. Thanks for that tip. >> >> I got a lot of input on my problem from a lot of folks on the list, the >> feedback is much appreciated. Thanks everybody! >> >> On max_prepared_transactions, I will factor in the number of coordinators >> and the max_connections on each coordinator while arriving at a figure. >> Will also try out Koichi Suzuki's suggestion to have multiple NICs on the >> GTM. I will post my findings here for the same cluster configuration as >> before. >> >> thanks, >> Shankar >> >> ________________________________ >> From: Ashutosh Bapat <ash...@en...> >> To: Shankar Hariharan <har...@ya...> >> Cc: "pos...@li..." >> <pos...@li...> >> Sent: Sunday, July 8, 2012 11:02 PM >> >> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >> Hi Shankar, >> You have got answers to the prepared transaction problem, I guess. I have >> something else below. >> >> On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan >> <har...@ya...> wrote: >> >> As planned I ran some tests using PGBench on this setup : >> >> Node 1 - Coord1, Datanode1, gtm-proxy1 >> Node 2- Coord2, Datanode2, gtm-proxy2 >> Node 3- Datanode3, gtm >> >> I was connecting via Coord1 for these tests: >> - scale factor of 30 used >> - tests run using the following input parameters for pgbench: >> >> >> Try connecting to both the coordinators, it should give you better >> performance, esp, when you are using distributed tables. With distributed >> tables, coordinator gets involved in query execution more than that in the >> case of replicated tables. So, balancing load across two coordinators would >> help. >> >> >> >> Clients Threads Duration Transactions >> 1 1 100 6204 >> 2 2 100 9960 >> 4 4 100 12880 >> 6 6 100 1676 >> >> >> >> 8 >> 8 8 100 19758 >> 10 10 100 21944 >> 12 12 100 20674 >> >> The run went well until the 8 clients. I started seeing errors on 10 >> clients onwards and eventually the 14 client run has been hanging around for >> over an hour now. The errors I have been seeing on console are the following >> : >> >> pgbench console : >> Client 8 aborted in state 12: ERROR: GTM error, could not obtain snapshot >> Client 0 aborted in state 13: ERROR: maximum number of prepared >> transactions reached >> Client 7 aborted in state 13: ERROR: maximum number of prepared >> transactions reached >> Client 11 aborted in state 13: ERROR: maximum number of prepared >> transactions reached >> Client 9 aborted in state 13: ERROR: maximum number of prepared >> transactions reached >> >> node console: >> ERROR: GTM error, could not obtain snapshot >> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >> VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); >> ERROR: maximum number of prepared transactions reached >> HINT: Increase max_prepared_transactions (currently 10). >> STATEMENT: PREPARE TRANSACTION 'T201428' >> ERROR: maximum number of prepared transactions reached >> STATEMENT: END; >> ERROR: maximum number of prepared transactions reached >> STATEMENT: END; >> ERROR: maximum number of prepared transactions reached >> STATEMENT: END; >> ERROR: maximum number of prepared transactions reached >> STATEMENT: END; >> ERROR: GTM error, could not obtain snapshot >> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >> VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); >> >> I was also watching the processes on each node and see the following for >> the 14 client run: >> >> >> Node1 : >> postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres >> postgres ::1(33481) TRUNCATE TABLE waiting >> postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres >> postgres pgbench-address (50388) TRUNCATE TABLE >> >> Node2: >> postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres >> postgres coord1-address(57357) idle in transaction >> >> Node3: >> postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres >> postgres coord1-address(51406) TRUNCATE TABLE waiting >> >> >> I was going to restart the processes on all nodes and start over but did >> not want to lose this data as it could be useful information. >> >> Any explanation on the above issue is much appreciated. I will try the >> next run with a higher value set for max_prepared_transactions. Any >> recommendations for a good value on this front? >> >> thanks, >> Shankar >> >> >> ________________________________ >> From: Shankar Hariharan <har...@ya...> >> To: Ashutosh Bapat <ash...@en...> >> Cc: "pos...@li..." >> <pos...@li...> >> Sent: Friday, July 6, 2012 8:22 AM >> >> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >> Hi Ashutosh, >> I was trying to size the load on a server and was wondering if a GTM >> could be shared w/o much performance overhead between a small number of >> datanodes and coordinators. I will post my findings here. >> thanks, >> Shankar >> >> ________________________________ >> From: Ashutosh Bapat <ash...@en...> >> To: Shankar Hariharan <har...@ya...> >> Cc: "pos...@li..." >> <pos...@li...> >> Sent: Friday, July 6, 2012 12:25 AM >> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >> Hi Shankar, >> Running gtm-proxy has shown to improve the performance, because it lessens >> the load on GTM, by serving requests locally. Why do you want the >> coordinators to connect directly to the GTM? Are you seeing any performance >> improvement from doing that? >> >> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >> <har...@ya...> wrote: >> >> Follow up to earlier email. In the setup described below, can I avoid >> using a gtm-proxy? That is, can I just simply point coordinators to the one >> gtm running on node 3 ? >> My initial plan was to just run the gtm on node 3 then I thought I could >> try a datanode without a local coordinator which was why I put these two >> together on node 3. >> thanks, >> Shankar >> >> ________________________________ >> From: Shankar Hariharan <har...@ya...> >> To: "pos...@li..." >> <pos...@li...> >> Sent: Thursday, July 5, 2012 11:35 PM >> Subject: Question on multiple coordinators >> >> Hello, >> >> Am trying out XC 1.0 in the following configuraiton. >> Node 1 - Coord1, Datanode1, gtm-proxy1 >> Node 2- Coord2, Datanode2, gtm-proxy2 >> Node 3- Datanode3, gtm >> >> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >> addition I missed the pg_hba edit as well. So the first table T1 that I >> created for distribution from Coord1 was not "visible| from Coord2 but was >> on all the data nodes. >> I tried to get Coord2 backinto business in various ways but the first >> table I created refused to show up on Coord2 : >> - edit pg_hba and add node on both coord1 and 2. Then run select >> pgxc_pool_reload(); >> - restart coord 1 and 2 >> - drop node c2 from c1 and c1 from c2 and add them back followed by select >> pgxc_pool_reload(); >> >> So I tried to create the same table T1 from Coord2 to observe behavior and >> it did not like it clearly as all nodes it "wrote" to reported that the >> table already existed which was good. At this point I could understand that >> Coord2 and Coord1 are not talking alright so I created a new table from >> coord1 with replication. This table was visible from both now. >> >> Question is should I expect to see the first table, let me call it T1 >> after a while from Coord2 also? >> >> >> thanks, >> Shankar >> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> >> >> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Nikhil S. <ni...@st...> - 2012-07-10 14:16:09
|
Hi Shankar, It is probably fastest if you hack it up for your own use for now. I would just keep the list of my coordinators in an array. Then in the doConnect() function I would use a rand/srand call and then just mod it with the number of servers to get the index into this array. I will then use this to get the host/port info. The normal PG community would not be interested in such type of a functionality anyways. Regards, Nikhils On Tue, Jul 10, 2012 at 9:49 AM, Shankar Hariharan <har...@ya...> wrote: > If no one else is looking at this I can definitely pick it up. Pls let me > know. > > thanks, > Shankar > > ________________________________ > From: Ashutosh Bapat <ash...@en...> > To: Nikhil Sontakke <ni...@st...> > Cc: Koichi Suzuki <koi...@gm...>; Shankar Hariharan > <har...@ya...>; > "pos...@li..." > <pos...@li...> > Sent: Tuesday, July 10, 2012 7:24 AM > > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > > > On Tue, Jul 10, 2012 at 5:46 PM, Nikhil Sontakke <ni...@st...> > wrote: > >> Yes. Although we don't have to care application partitioning based >> upon the distribution key, it's a good idea to make all the >> coordinator workload as even as possible. >> >> In the case of DBT-1, we ran several DBT-1 process, each produces >> random transaction but goes to specific coordinator. >> >> I think pgbench can do the similar. >> > > Well, a quick look at pgbench.c suggests that changing the doConnect() > function to pick up a random pghost and pgport set whenever it's > called should be enough to get this going. > > > That's good. May be we can pick those in round robin fashion to get > deterministic results. > > > > Regards, > Nikhils > >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/7/10 Ashutosh Bapat <ash...@en...>: >>> Hi Shankar, >>> Will it be possible for you to change the pgbench code to dynamically >>> fire >>> on all available coordinators? >>> >>> Since we use modified DBT-1 for our benchmarking, we haven't got to the >>> point where we can modify pg_bench to suite XC. But that's something, we >>> will welcome if anybody is interested. >>> >>> >>> On Mon, Jul 9, 2012 at 9:41 PM, Shankar Hariharan >>> <har...@ya...> wrote: >>>> >>>> Thanks Ashutosh. You are right, while running this test i just had >>>> pgbench >>>> running against one coordinator. Looks like pgbench by itself may not be >>>> an >>>> apt tool for this kind of testing, I will instead run pgbench's >>>> underlying >>>> sql script from cmdline against either coordinators. Thanks for that >>>> tip. >>>> >>>> I got a lot of input on my problem from a lot of folks on the list, the >>>> feedback is much appreciated. Thanks everybody! >>>> >>>> On max_prepared_transactions, I will factor in the number of >>>> coordinators >>>> and the max_connections on each coordinator while arriving at a figure. >>>> Will also try out Koichi Suzuki's suggestion to have multiple NICs on >>>> the >>>> GTM. I will post my findings here for the same cluster configuration as >>>> before. >>>> >>>> thanks, >>>> Shankar >>>> >>>> ________________________________ >>>> From: Ashutosh Bapat <ash...@en...> >>>> To: Shankar Hariharan <har...@ya...> >>>> Cc: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Sunday, July 8, 2012 11:02 PM >>>> >>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>> >>>> Hi Shankar, >>>> You have got answers to the prepared transaction problem, I guess. I >>>> have >>>> something else below. >>>> >>>> On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan >>>> <har...@ya...> wrote: >>>> >>>> As planned I ran some tests using PGBench on this setup : >>>> >>>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>>> Node 2- Coord2, Datanode2, gtm-proxy2 >>>> Node 3- Datanode3, gtm >>>> >>>> I was connecting via Coord1 for these tests: >>>> - scale factor of 30 used >>>> - tests run using the following input parameters for pgbench: >>>> >>>> >>>> Try connecting to both the coordinators, it should give you better >>>> performance, esp, when you are using distributed tables. With >>>> distributed >>>> tables, coordinator gets involved in query execution more than that in >>>> the >>>> case of replicated tables. So, balancing load across two coordinators >>>> would >>>> help. >>>> >>>> >>>> >>>> Clients Threads Duration Transactions >>>> 1 1 100 6204 >>>> 2 2 100 9960 >>>> 4 4 100 12880 >>>> 6 6 100 1676 >>>> >>>> >>>> >>>> 8 >>>> 8 8 100 19758 >>>> 10 10 100 21944 >>>> 12 12 100 20674 >>>> >>>> The run went well until the 8 clients. I started seeing errors on 10 >>>> clients onwards and eventually the 14 client run has been hanging around >>>> for >>>> over an hour now. The errors I have been seeing on console are the >>>> following >>>> : >>>> >>>> pgbench console : >>>> Client 8 aborted in state 12: ERROR: GTM error, could not obtain >>>> snapshot >>>> Client 0 aborted in state 13: ERROR: maximum number of prepared >>>> transactions reached >>>> Client 7 aborted in state 13: ERROR: maximum number of prepared >>>> transactions reached >>>> Client 11 aborted in state 13: ERROR: maximum number of prepared >>>> transactions reached >>>> Client 9 aborted in state 13: ERROR: maximum number of prepared >>>> transactions reached >>>> >>>> node console: >>>> ERROR: GTM error, could not obtain snapshot >>>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >>>> VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); >>>> ERROR: maximum number of prepared transactions reached >>>> HINT: Increase max_prepared_transactions (currently 10). >>>> STATEMENT: PREPARE TRANSACTION 'T201428' >>>> ERROR: maximum number of prepared transactions reached >>>> STATEMENT: END; >>>> ERROR: maximum number of prepared transactions reached >>>> STATEMENT: END; >>>> ERROR: maximum number of prepared transactions reached >>>> STATEMENT: END; >>>> ERROR: maximum number of prepared transactions reached >>>> STATEMENT: END; >>>> ERROR: GTM error, could not obtain snapshot >>>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >>>> VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); >>>> >>>> I was also watching the processes on each node and see the following for >>>> the 14 client run: >>>> >>>> >>>> Node1 : >>>> postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres >>>> postgres ::1(33481) TRUNCATE TABLE waiting >>>> postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres >>>> postgres pgbench-address (50388) TRUNCATE TABLE >>>> >>>> Node2: >>>> postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres >>>> postgres coord1-address(57357) idle in transaction >>>> >>>> Node3: >>>> postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres >>>> postgres coord1-address(51406) TRUNCATE TABLE waiting >>>> >>>> >>>> I was going to restart the processes on all nodes and start over but did >>>> not want to lose this data as it could be useful information. >>>> >>>> Any explanation on the above issue is much appreciated. I will try the >>>> next run with a higher value set for max_prepared_transactions. Any >>>> recommendations for a good value on this front? >>>> >>>> thanks, >>>> Shankar >>>> >>>> >>>> ________________________________ >>>> From: Shankar Hariharan <har...@ya...> >>>> To: Ashutosh Bapat <ash...@en...> >>>> Cc: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Friday, July 6, 2012 8:22 AM >>>> >>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>> >>>> Hi Ashutosh, >>>> I was trying to size the load on a server and was wondering if a GTM >>>> could be shared w/o much performance overhead between a small number of >>>> datanodes and coordinators. I will post my findings here. >>>> thanks, >>>> Shankar >>>> >>>> ________________________________ >>>> From: Ashutosh Bapat <ash...@en...> >>>> To: Shankar Hariharan <har...@ya...> >>>> Cc: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Friday, July 6, 2012 12:25 AM >>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>> >>>> Hi Shankar, >>>> Running gtm-proxy has shown to improve the performance, because it >>>> lessens >>>> the load on GTM, by serving requests locally. Why do you want the >>>> coordinators to connect directly to the GTM? Are you seeing any >>>> performance >>>> improvement from doing that? >>>> >>>> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >>>> <har...@ya...> wrote: >>>> >>>> Follow up to earlier email. In the setup described below, can I avoid >>>> using a gtm-proxy? That is, can I just simply point coordinators to the >>>> one >>>> gtm running on node 3 ? >>>> My initial plan was to just run the gtm on node 3 then I thought I could >>>> try a datanode without a local coordinator which was why I put these two >>>> together on node 3. >>>> thanks, >>>> Shankar >>>> >>>> ________________________________ >>>> From: Shankar Hariharan <har...@ya...> >>>> To: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Thursday, July 5, 2012 11:35 PM >>>> Subject: Question on multiple coordinators >>>> >>>> Hello, >>>> >>>> Am trying out XC 1.0 in the following configuraiton. >>>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>>> Node 2- Coord2, Datanode2, gtm-proxy2 >>>> Node 3- Datanode3, gtm >>>> >>>> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >>>> addition I missed the pg_hba edit as well. So the first table T1 that I >>>> created for distribution from Coord1 was not "visible| from Coord2 but >>>> was >>>> on all the data nodes. >>>> I tried to get Coord2 backinto business in various ways but the first >>>> table I created refused to show up on Coord2 : >>>> - edit pg_hba and add node on both coord1 and 2. Then run select >>>> pgxc_pool_reload(); >>>> - restart coord 1 and 2 >>>> - drop node c2 from c1 and c1 from c2 and add them back followed by >>>> select >>>> pgxc_pool_reload(); >>>> >>>> So I tried to create the same table T1 from Coord2 to observe behavior >>>> and >>>> it did not like it clearly as all nodes it "wrote" to reported that the >>>> table already existed which was good. At this point I could understand >>>> that >>>> Coord2 and Coord1 are not talking alright so I created a new table from >>>> coord1 with replication. This table was visible from both now. >>>> >>>> Question is should I expect to see the first table, let me call it T1 >>>> after a while from Coord2 also? >>>> >>>> >>>> thanks, >>>> Shankar >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > -- StormDB - http://www.stormdb.com The Database Cloud |
From: Shankar H. <har...@ya...> - 2012-07-10 13:49:37
|
If no one else is looking at this I can definitely pick it up. Pls let me know. thanks, Shankar ________________________________ From: Ashutosh Bapat <ash...@en...> To: Nikhil Sontakke <ni...@st...> Cc: Koichi Suzuki <koi...@gm...>; Shankar Hariharan <har...@ya...>; "pos...@li..." <pos...@li...> Sent: Tuesday, July 10, 2012 7:24 AM Subject: Re: [Postgres-xc-developers] Question on gtm-proxy On Tue, Jul 10, 2012 at 5:46 PM, Nikhil Sontakke <ni...@st...> wrote: > Yes. Although we don't have to care application partitioning based >> upon the distribution key, it's a good idea to make all the >> coordinator workload as even as possible. >> >> In the case of DBT-1, we ran several DBT-1 process, each produces >> random transaction but goes to specific coordinator. >> >> I think pgbench can do the similar. >> > >Well, a quick look at pgbench.c suggests that changing the doConnect() >function to pick up a random pghost and pgport set whenever it's >called should be enough to get this going. > That's good. May be we can pick those in round robin fashion to get deterministic results. >Regards, >Nikhils > > >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/7/10 Ashutosh Bapat <ash...@en...>: >>> Hi Shankar, >>> Will it be possible for you to change the pgbench code to dynamically fire >>> on all available coordinators? >>> >>> Since we use modified DBT-1 for our benchmarking, we haven't got to the >>> point where we can modify pg_bench to suite XC. But that's something, we >>> will welcome if anybody is interested. >>> >>> >>> On Mon, Jul 9, 2012 at 9:41 PM, Shankar Hariharan >>> <har...@ya...> wrote: >>>> >>>> Thanks Ashutosh. You are right, while running this test i just had pgbench >>>> running against one coordinator. Looks like pgbench by itself may not be an >>>> apt tool for this kind of testing, I will instead run pgbench's underlying >>>> sql script from cmdline against either coordinators. Thanks for that tip. >>>> >>>> I got a lot of input on my problem from a lot of folks on the list, the >>>> feedback is much appreciated. Thanks everybody! >>>> >>>> On max_prepared_transactions, I will factor in the number of coordinators >>>> and the max_connections on each coordinator while arriving at a figure. >>>> Will also try out Koichi Suzuki's suggestion to have multiple NICs on the >>>> GTM. I will post my findings here for the same cluster configuration as >>>> before. >>>> >>>> thanks, >>>> Shankar >>>> >>>> ________________________________ >>>> From: Ashutosh Bapat <ash...@en...> >>>> To: Shankar Hariharan <har...@ya...> >>>> Cc: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Sunday, July 8, 2012 11:02 PM >>>> >>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>> >>>> Hi Shankar, >>>> You have got answers to the prepared transaction problem, I guess. I have >>>> something else below. >>>> >>>> On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan >>>> <har...@ya...> wrote: >>>> >>>> As planned I ran some tests using PGBench on this setup : >>>> >>>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>>> Node 2- Coord2, Datanode2, gtm-proxy2 >>>> Node 3- Datanode3, gtm >>>> >>>> I was connecting via Coord1 for these tests: >>>> - scale factor of 30 used >>>> - tests run using the following input parameters for pgbench: >>>> >>>> >>>> Try connecting to both the coordinators, it should give you better >>>> performance, esp, when you are using distributed tables. With distributed >>>> tables, coordinator gets involved in query execution more than that in the >>>> case of replicated tables. So, balancing load across two coordinators would >>>> help. >>>> >>>> >>>> >>>> Clients Threads Duration Transactions >>>> 1 1 100 6204 >>>> 2 2 100 9960 >>>> 4 4 100 12880 >>>> 6 6 100 1676 >>>> >>>> >>>> >>>> 8 >>>> 8 8 100 19758 >>>> 10 10 100 21944 >>>> 12 12 100 20674 >>>> >>>> The run went well until the 8 clients. I started seeing errors on 10 >>>> clients onwards and eventually the 14 client run has been hanging around for >>>> over an hour now. The errors I have been seeing on console are the following >>>> : >>>> >>>> pgbench console : >>>> Client 8 aborted in state 12: ERROR: GTM error, could not obtain snapshot >>>> Client 0 aborted in state 13: ERROR: maximum number of prepared >>>> transactions reached >>>> Client 7 aborted in state 13: ERROR: maximum number of prepared >>>> transactions reached >>>> Client 11 aborted in state 13: ERROR: maximum number of prepared >>>> transactions reached >>>> Client 9 aborted in state 13: ERROR: maximum number of prepared >>>> transactions reached >>>> >>>> node console: >>>> ERROR: GTM error, could not obtain snapshot >>>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >>>> VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); >>>> ERROR: maximum number of prepared transactions reached >>>> HINT: Increase max_prepared_transactions (currently 10). >>>> STATEMENT: PREPARE TRANSACTION 'T201428' >>>> ERROR: maximum number of prepared transactions reached >>>> STATEMENT: END; >>>> ERROR: maximum number of prepared transactions reached >>>> STATEMENT: END; >>>> ERROR: maximum number of prepared transactions reached >>>> STATEMENT: END; >>>> ERROR: maximum number of prepared transactions reached >>>> STATEMENT: END; >>>> ERROR: GTM error, could not obtain snapshot >>>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >>>> VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); >>>> >>>> I was also watching the processes on each node and see the following for >>>> the 14 client run: >>>> >>>> >>>> Node1 : >>>> postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres >>>> postgres ::1(33481) TRUNCATE TABLE waiting >>>> postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres >>>> postgres pgbench-address (50388) TRUNCATE TABLE >>>> >>>> Node2: >>>> postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres >>>> postgres coord1-address(57357) idle in transaction >>>> >>>> Node3: >>>> postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres >>>> postgres coord1-address(51406) TRUNCATE TABLE waiting >>>> >>>> >>>> I was going to restart the processes on all nodes and start over but did >>>> not want to lose this data as it could be useful information. >>>> >>>> Any explanation on the above issue is much appreciated. I will try the >>>> next run with a higher value set for max_prepared_transactions. Any >>>> recommendations for a good value on this front? >>>> >>>> thanks, >>>> Shankar >>>> >>>> >>>> ________________________________ >>>> From: Shankar Hariharan <har...@ya...> >>>> To: Ashutosh Bapat <ash...@en...> >>>> Cc: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Friday, July 6, 2012 8:22 AM >>>> >>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>> >>>> Hi Ashutosh, >>>> I was trying to size the load on a server and was wondering if a GTM >>>> could be shared w/o much performance overhead between a small number of >>>> datanodes and coordinators. I will post my findings here. >>>> thanks, >>>> Shankar >>>> >>>> ________________________________ >>>> From: Ashutosh Bapat <ash...@en...> >>>> To: Shankar Hariharan <har...@ya...> >>>> Cc: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Friday, July 6, 2012 12:25 AM >>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>> >>>> Hi Shankar, >>>> Running gtm-proxy has shown to improve the performance, because it lessens >>>> the load on GTM, by serving requests locally. Why do you want the >>>> coordinators to connect directly to the GTM? Are you seeing any performance >>>> improvement from doing that? >>>> >>>> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >>>> <har...@ya...> wrote: >>>> >>>> Follow up to earlier email. In the setup described below, can I avoid >>>> using a gtm-proxy? That is, can I just simply point coordinators to the one >>>> gtm running on node 3 ? >>>> My initial plan was to just run the gtm on node 3 then I thought I could >>>> try a datanode without a local coordinator which was why I put these two >>>> together on node 3. >>>> thanks, >>>> Shankar >>>> >>>> ________________________________ >>>> From: Shankar Hariharan <har...@ya...> >>>> To: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Thursday, July 5, 2012 11:35 PM >>>> Subject: Question on multiple coordinators >>>> >>>> Hello, >>>> >>>> Am trying out XC 1.0 in the following configuraiton. >>>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>>> Node 2- Coord2, Datanode2, gtm-proxy2 >>>> Node 3- Datanode3, gtm >>>> >>>> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >>>> addition I missed the pg_hba edit as well. So the first table T1 that I >>>> created for distribution from Coord1 was not "visible| from Coord2 but was >>>> on all the data nodes. >>>> I tried to get Coord2 backinto business in various ways but the first >>>> table I created refused to show up on Coord2 : >>>> - edit pg_hba and add node on both coord1 and 2. Then run select >>>> pgxc_pool_reload(); >>>> - restart coord 1 and 2 >>>> - drop node c2 from c1 and c1 from c2 and add them back followed by select >>>> pgxc_pool_reload(); >>>> >>>> So I tried to create the same table T1 from Coord2 to observe behavior and >>>> it did not like it clearly as all nodes it "wrote" to reported that the >>>> table already existed which was good. At this point I could understand that >>>> Coord2 and Coord1 are not talking alright so I created a new table from >>>> coord1 with replication. This table was visible from both now. >>>> >>>> Question is should I expect to see the first table, let me call it T1 >>>> after a while from Coord2 also? >>>> >>>> >>>> thanks, >>>> Shankar >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > >-- > >StormDB - http://www.stormdb.com >The Database Cloud > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Nikhil S. <ni...@st...> - 2012-07-10 12:44:45
|
On Tue, Jul 10, 2012 at 8:24 AM, Ashutosh Bapat <ash...@en...> wrote: > > > On Tue, Jul 10, 2012 at 5:46 PM, Nikhil Sontakke <ni...@st...> > wrote: >> >> > Yes. Although we don't have to care application partitioning based >> > upon the distribution key, it's a good idea to make all the >> > coordinator workload as even as possible. >> > >> > In the case of DBT-1, we ran several DBT-1 process, each produces >> > random transaction but goes to specific coordinator. >> > >> > I think pgbench can do the similar. >> > >> >> Well, a quick look at pgbench.c suggests that changing the doConnect() >> function to pick up a random pghost and pgport set whenever it's >> called should be enough to get this going. > > > That's good. May be we can pick those in round robin fashion to get > deterministic results. > Well, that's ok I think. If you are running a decent/large number of transactions, it tends to be squarely spread across with random calls too. Regards, Nikhils >> >> >> Regards, >> Nikhils >> >> > Regards; >> > ---------- >> > Koichi Suzuki >> > >> > >> > 2012/7/10 Ashutosh Bapat <ash...@en...>: >> >> Hi Shankar, >> >> Will it be possible for you to change the pgbench code to dynamically >> >> fire >> >> on all available coordinators? >> >> >> >> Since we use modified DBT-1 for our benchmarking, we haven't got to the >> >> point where we can modify pg_bench to suite XC. But that's something, >> >> we >> >> will welcome if anybody is interested. >> >> >> >> >> >> On Mon, Jul 9, 2012 at 9:41 PM, Shankar Hariharan >> >> <har...@ya...> wrote: >> >>> >> >>> Thanks Ashutosh. You are right, while running this test i just had >> >>> pgbench >> >>> running against one coordinator. Looks like pgbench by itself may not >> >>> be an >> >>> apt tool for this kind of testing, I will instead run pgbench's >> >>> underlying >> >>> sql script from cmdline against either coordinators. Thanks for >> >>> that tip. >> >>> >> >>> I got a lot of input on my problem from a lot of folks on the list, >> >>> the >> >>> feedback is much appreciated. Thanks everybody! >> >>> >> >>> On max_prepared_transactions, I will factor in the number of >> >>> coordinators >> >>> and the max_connections on each coordinator while arriving at a >> >>> figure. >> >>> Will also try out Koichi Suzuki's suggestion to have multiple NICs on >> >>> the >> >>> GTM. I will post my findings here for the same cluster configuration >> >>> as >> >>> before. >> >>> >> >>> thanks, >> >>> Shankar >> >>> >> >>> ________________________________ >> >>> From: Ashutosh Bapat <ash...@en...> >> >>> To: Shankar Hariharan <har...@ya...> >> >>> Cc: "pos...@li..." >> >>> <pos...@li...> >> >>> Sent: Sunday, July 8, 2012 11:02 PM >> >>> >> >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >>> >> >>> Hi Shankar, >> >>> You have got answers to the prepared transaction problem, I guess. I >> >>> have >> >>> something else below. >> >>> >> >>> On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan >> >>> <har...@ya...> wrote: >> >>> >> >>> As planned I ran some tests using PGBench on this setup : >> >>> >> >>> Node 1 - Coord1, Datanode1, gtm-proxy1 >> >>> Node 2- Coord2, Datanode2, gtm-proxy2 >> >>> Node 3- Datanode3, gtm >> >>> >> >>> I was connecting via Coord1 for these tests: >> >>> - scale factor of 30 used >> >>> - tests run using the following input parameters for pgbench: >> >>> >> >>> >> >>> Try connecting to both the coordinators, it should give you better >> >>> performance, esp, when you are using distributed tables. With >> >>> distributed >> >>> tables, coordinator gets involved in query execution more than that in >> >>> the >> >>> case of replicated tables. So, balancing load across two coordinators >> >>> would >> >>> help. >> >>> >> >>> >> >>> >> >>> Clients Threads Duration Transactions >> >>> 1 1 100 6204 >> >>> 2 2 100 9960 >> >>> 4 4 100 12880 >> >>> 6 6 100 1676 >> >>> >> >>> >> >>> >> >>> 8 >> >>> 8 8 100 19758 >> >>> 10 10 100 21944 >> >>> 12 12 100 20674 >> >>> >> >>> The run went well until the 8 clients. I started seeing errors on 10 >> >>> clients onwards and eventually the 14 client run has been hanging >> >>> around for >> >>> over an hour now. The errors I have been seeing on console are the >> >>> following >> >>> : >> >>> >> >>> pgbench console : >> >>> Client 8 aborted in state 12: ERROR: GTM error, could not obtain >> >>> snapshot >> >>> Client 0 aborted in state 13: ERROR: maximum number of prepared >> >>> transactions reached >> >>> Client 7 aborted in state 13: ERROR: maximum number of prepared >> >>> transactions reached >> >>> Client 11 aborted in state 13: ERROR: maximum number of prepared >> >>> transactions reached >> >>> Client 9 aborted in state 13: ERROR: maximum number of prepared >> >>> transactions reached >> >>> >> >>> node console: >> >>> ERROR: GTM error, could not obtain snapshot >> >>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >> >>> VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); >> >>> ERROR: maximum number of prepared transactions reached >> >>> HINT: Increase max_prepared_transactions (currently 10). >> >>> STATEMENT: PREPARE TRANSACTION 'T201428' >> >>> ERROR: maximum number of prepared transactions reached >> >>> STATEMENT: END; >> >>> ERROR: maximum number of prepared transactions reached >> >>> STATEMENT: END; >> >>> ERROR: maximum number of prepared transactions reached >> >>> STATEMENT: END; >> >>> ERROR: maximum number of prepared transactions reached >> >>> STATEMENT: END; >> >>> ERROR: GTM error, could not obtain snapshot >> >>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >> >>> VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); >> >>> >> >>> I was also watching the processes on each node and see the following >> >>> for >> >>> the 14 client run: >> >>> >> >>> >> >>> Node1 : >> >>> postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres >> >>> postgres ::1(33481) TRUNCATE TABLE waiting >> >>> postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres >> >>> postgres pgbench-address (50388) TRUNCATE TABLE >> >>> >> >>> Node2: >> >>> postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres >> >>> postgres coord1-address(57357) idle in transaction >> >>> >> >>> Node3: >> >>> postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres >> >>> postgres coord1-address(51406) TRUNCATE TABLE waiting >> >>> >> >>> >> >>> I was going to restart the processes on all nodes and start over but >> >>> did >> >>> not want to lose this data as it could be useful information. >> >>> >> >>> Any explanation on the above issue is much appreciated. I will try the >> >>> next run with a higher value set for max_prepared_transactions. Any >> >>> recommendations for a good value on this front? >> >>> >> >>> thanks, >> >>> Shankar >> >>> >> >>> >> >>> ________________________________ >> >>> From: Shankar Hariharan <har...@ya...> >> >>> To: Ashutosh Bapat <ash...@en...> >> >>> Cc: "pos...@li..." >> >>> <pos...@li...> >> >>> Sent: Friday, July 6, 2012 8:22 AM >> >>> >> >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >>> >> >>> Hi Ashutosh, >> >>> I was trying to size the load on a server and was wondering if a GTM >> >>> could be shared w/o much performance overhead between a small number >> >>> of >> >>> datanodes and coordinators. I will post my findings here. >> >>> thanks, >> >>> Shankar >> >>> >> >>> ________________________________ >> >>> From: Ashutosh Bapat <ash...@en...> >> >>> To: Shankar Hariharan <har...@ya...> >> >>> Cc: "pos...@li..." >> >>> <pos...@li...> >> >>> Sent: Friday, July 6, 2012 12:25 AM >> >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >>> >> >>> Hi Shankar, >> >>> Running gtm-proxy has shown to improve the performance, because it >> >>> lessens >> >>> the load on GTM, by serving requests locally. Why do you want the >> >>> coordinators to connect directly to the GTM? Are you seeing any >> >>> performance >> >>> improvement from doing that? >> >>> >> >>> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >> >>> <har...@ya...> wrote: >> >>> >> >>> Follow up to earlier email. In the setup described below, can I avoid >> >>> using a gtm-proxy? That is, can I just simply point coordinators to >> >>> the one >> >>> gtm running on node 3 ? >> >>> My initial plan was to just run the gtm on node 3 then I thought I >> >>> could >> >>> try a datanode without a local coordinator which was why I put these >> >>> two >> >>> together on node 3. >> >>> thanks, >> >>> Shankar >> >>> >> >>> ________________________________ >> >>> From: Shankar Hariharan <har...@ya...> >> >>> To: "pos...@li..." >> >>> <pos...@li...> >> >>> Sent: Thursday, July 5, 2012 11:35 PM >> >>> Subject: Question on multiple coordinators >> >>> >> >>> Hello, >> >>> >> >>> Am trying out XC 1.0 in the following configuraiton. >> >>> Node 1 - Coord1, Datanode1, gtm-proxy1 >> >>> Node 2- Coord2, Datanode2, gtm-proxy2 >> >>> Node 3- Datanode3, gtm >> >>> >> >>> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. >> >>> In >> >>> addition I missed the pg_hba edit as well. So the first table T1 that >> >>> I >> >>> created for distribution from Coord1 was not "visible| from Coord2 but >> >>> was >> >>> on all the data nodes. >> >>> I tried to get Coord2 backinto business in various ways but the first >> >>> table I created refused to show up on Coord2 : >> >>> - edit pg_hba and add node on both coord1 and 2. Then run select >> >>> pgxc_pool_reload(); >> >>> - restart coord 1 and 2 >> >>> - drop node c2 from c1 and c1 from c2 and add them back followed by >> >>> select >> >>> pgxc_pool_reload(); >> >>> >> >>> So I tried to create the same table T1 from Coord2 to observe behavior >> >>> and >> >>> it did not like it clearly as all nodes it "wrote" to reported that >> >>> the >> >>> table already existed which was good. At this point I could understand >> >>> that >> >>> Coord2 and Coord1 are not talking alright so I created a new table >> >>> from >> >>> coord1 with replication. This table was visible from both now. >> >>> >> >>> Question is should I expect to see the first table, let me call it T1 >> >>> after a while from Coord2 also? >> >>> >> >>> >> >>> thanks, >> >>> Shankar >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------------ >> >>> Live Security Virtual Conference >> >>> Exclusive live event will cover all the ways today's security and >> >>> threat landscape has changed and how IT managers can respond. >> >>> Discussions >> >>> will include endpoint security, mobile security and the latest in >> >>> malware >> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >>> _______________________________________________ >> >>> Postgres-xc-developers mailing list >> >>> Pos...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Best Wishes, >> >>> Ashutosh Bapat >> >>> EntepriseDB Corporation >> >>> The Enterprise Postgres Company >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Best Wishes, >> >>> Ashutosh Bapat >> >>> EntepriseDB Corporation >> >>> The Enterprise Postgres Company >> >>> >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Best Wishes, >> >> Ashutosh Bapat >> >> EntepriseDB Corporation >> >> The Enterprise Postgres Company >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Live Security Virtual Conference >> >> Exclusive live event will cover all the ways today's security and >> >> threat landscape has changed and how IT managers can respond. >> >> Discussions >> >> will include endpoint security, mobile security and the latest in >> >> malware >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> _______________________________________________ >> >> Postgres-xc-developers mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -- StormDB - http://www.stormdb.com The Database Cloud |
From: Nikhil S. <ni...@st...> - 2012-07-10 12:39:45
|
> Yes. Although we don't have to care application partitioning based > upon the distribution key, it's a good idea to make all the > coordinator workload as even as possible. > > In the case of DBT-1, we ran several DBT-1 process, each produces > random transaction but goes to specific coordinator. > > I think pgbench can do the similar. > Well, a quick look at pgbench.c suggests that changing the doConnect() function to pick up a random pghost and pgport set whenever it's called should be enough to get this going. Regards, Nikhils > Regards; > ---------- > Koichi Suzuki > > > 2012/7/10 Ashutosh Bapat <ash...@en...>: >> Hi Shankar, >> Will it be possible for you to change the pgbench code to dynamically fire >> on all available coordinators? >> >> Since we use modified DBT-1 for our benchmarking, we haven't got to the >> point where we can modify pg_bench to suite XC. But that's something, we >> will welcome if anybody is interested. >> >> >> On Mon, Jul 9, 2012 at 9:41 PM, Shankar Hariharan >> <har...@ya...> wrote: >>> >>> Thanks Ashutosh. You are right, while running this test i just had pgbench >>> running against one coordinator. Looks like pgbench by itself may not be an >>> apt tool for this kind of testing, I will instead run pgbench's underlying >>> sql script from cmdline against either coordinators. Thanks for that tip. >>> >>> I got a lot of input on my problem from a lot of folks on the list, the >>> feedback is much appreciated. Thanks everybody! >>> >>> On max_prepared_transactions, I will factor in the number of coordinators >>> and the max_connections on each coordinator while arriving at a figure. >>> Will also try out Koichi Suzuki's suggestion to have multiple NICs on the >>> GTM. I will post my findings here for the same cluster configuration as >>> before. >>> >>> thanks, >>> Shankar >>> >>> ________________________________ >>> From: Ashutosh Bapat <ash...@en...> >>> To: Shankar Hariharan <har...@ya...> >>> Cc: "pos...@li..." >>> <pos...@li...> >>> Sent: Sunday, July 8, 2012 11:02 PM >>> >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>> >>> Hi Shankar, >>> You have got answers to the prepared transaction problem, I guess. I have >>> something else below. >>> >>> On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan >>> <har...@ya...> wrote: >>> >>> As planned I ran some tests using PGBench on this setup : >>> >>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>> Node 2- Coord2, Datanode2, gtm-proxy2 >>> Node 3- Datanode3, gtm >>> >>> I was connecting via Coord1 for these tests: >>> - scale factor of 30 used >>> - tests run using the following input parameters for pgbench: >>> >>> >>> Try connecting to both the coordinators, it should give you better >>> performance, esp, when you are using distributed tables. With distributed >>> tables, coordinator gets involved in query execution more than that in the >>> case of replicated tables. So, balancing load across two coordinators would >>> help. >>> >>> >>> >>> Clients Threads Duration Transactions >>> 1 1 100 6204 >>> 2 2 100 9960 >>> 4 4 100 12880 >>> 6 6 100 1676 >>> >>> >>> >>> 8 >>> 8 8 100 19758 >>> 10 10 100 21944 >>> 12 12 100 20674 >>> >>> The run went well until the 8 clients. I started seeing errors on 10 >>> clients onwards and eventually the 14 client run has been hanging around for >>> over an hour now. The errors I have been seeing on console are the following >>> : >>> >>> pgbench console : >>> Client 8 aborted in state 12: ERROR: GTM error, could not obtain snapshot >>> Client 0 aborted in state 13: ERROR: maximum number of prepared >>> transactions reached >>> Client 7 aborted in state 13: ERROR: maximum number of prepared >>> transactions reached >>> Client 11 aborted in state 13: ERROR: maximum number of prepared >>> transactions reached >>> Client 9 aborted in state 13: ERROR: maximum number of prepared >>> transactions reached >>> >>> node console: >>> ERROR: GTM error, could not obtain snapshot >>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >>> VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); >>> ERROR: maximum number of prepared transactions reached >>> HINT: Increase max_prepared_transactions (currently 10). >>> STATEMENT: PREPARE TRANSACTION 'T201428' >>> ERROR: maximum number of prepared transactions reached >>> STATEMENT: END; >>> ERROR: maximum number of prepared transactions reached >>> STATEMENT: END; >>> ERROR: maximum number of prepared transactions reached >>> STATEMENT: END; >>> ERROR: maximum number of prepared transactions reached >>> STATEMENT: END; >>> ERROR: GTM error, could not obtain snapshot >>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >>> VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); >>> >>> I was also watching the processes on each node and see the following for >>> the 14 client run: >>> >>> >>> Node1 : >>> postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres >>> postgres ::1(33481) TRUNCATE TABLE waiting >>> postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres >>> postgres pgbench-address (50388) TRUNCATE TABLE >>> >>> Node2: >>> postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres >>> postgres coord1-address(57357) idle in transaction >>> >>> Node3: >>> postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres >>> postgres coord1-address(51406) TRUNCATE TABLE waiting >>> >>> >>> I was going to restart the processes on all nodes and start over but did >>> not want to lose this data as it could be useful information. >>> >>> Any explanation on the above issue is much appreciated. I will try the >>> next run with a higher value set for max_prepared_transactions. Any >>> recommendations for a good value on this front? >>> >>> thanks, >>> Shankar >>> >>> >>> ________________________________ >>> From: Shankar Hariharan <har...@ya...> >>> To: Ashutosh Bapat <ash...@en...> >>> Cc: "pos...@li..." >>> <pos...@li...> >>> Sent: Friday, July 6, 2012 8:22 AM >>> >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>> >>> Hi Ashutosh, >>> I was trying to size the load on a server and was wondering if a GTM >>> could be shared w/o much performance overhead between a small number of >>> datanodes and coordinators. I will post my findings here. >>> thanks, >>> Shankar >>> >>> ________________________________ >>> From: Ashutosh Bapat <ash...@en...> >>> To: Shankar Hariharan <har...@ya...> >>> Cc: "pos...@li..." >>> <pos...@li...> >>> Sent: Friday, July 6, 2012 12:25 AM >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>> >>> Hi Shankar, >>> Running gtm-proxy has shown to improve the performance, because it lessens >>> the load on GTM, by serving requests locally. Why do you want the >>> coordinators to connect directly to the GTM? Are you seeing any performance >>> improvement from doing that? >>> >>> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >>> <har...@ya...> wrote: >>> >>> Follow up to earlier email. In the setup described below, can I avoid >>> using a gtm-proxy? That is, can I just simply point coordinators to the one >>> gtm running on node 3 ? >>> My initial plan was to just run the gtm on node 3 then I thought I could >>> try a datanode without a local coordinator which was why I put these two >>> together on node 3. >>> thanks, >>> Shankar >>> >>> ________________________________ >>> From: Shankar Hariharan <har...@ya...> >>> To: "pos...@li..." >>> <pos...@li...> >>> Sent: Thursday, July 5, 2012 11:35 PM >>> Subject: Question on multiple coordinators >>> >>> Hello, >>> >>> Am trying out XC 1.0 in the following configuraiton. >>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>> Node 2- Coord2, Datanode2, gtm-proxy2 >>> Node 3- Datanode3, gtm >>> >>> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >>> addition I missed the pg_hba edit as well. So the first table T1 that I >>> created for distribution from Coord1 was not "visible| from Coord2 but was >>> on all the data nodes. >>> I tried to get Coord2 backinto business in various ways but the first >>> table I created refused to show up on Coord2 : >>> - edit pg_hba and add node on both coord1 and 2. Then run select >>> pgxc_pool_reload(); >>> - restart coord 1 and 2 >>> - drop node c2 from c1 and c1 from c2 and add them back followed by select >>> pgxc_pool_reload(); >>> >>> So I tried to create the same table T1 from Coord2 to observe behavior and >>> it did not like it clearly as all nodes it "wrote" to reported that the >>> table already existed which was good. At this point I could understand that >>> Coord2 and Coord1 are not talking alright so I created a new table from >>> coord1 with replication. This table was visible from both now. >>> >>> Question is should I expect to see the first table, let me call it T1 >>> after a while from Coord2 also? >>> >>> >>> thanks, >>> Shankar >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers -- StormDB - http://www.stormdb.com The Database Cloud |
From: Ashutosh B. <ash...@en...> - 2012-07-10 12:24:41
|
On Tue, Jul 10, 2012 at 5:46 PM, Nikhil Sontakke <ni...@st...>wrote: > > Yes. Although we don't have to care application partitioning based > > upon the distribution key, it's a good idea to make all the > > coordinator workload as even as possible. > > > > In the case of DBT-1, we ran several DBT-1 process, each produces > > random transaction but goes to specific coordinator. > > > > I think pgbench can do the similar. > > > > Well, a quick look at pgbench.c suggests that changing the doConnect() > function to pick up a random pghost and pgport set whenever it's > called should be enough to get this going. > That's good. May be we can pick those in round robin fashion to get deterministic results. > > Regards, > Nikhils > > > Regards; > > ---------- > > Koichi Suzuki > > > > > > 2012/7/10 Ashutosh Bapat <ash...@en...>: > >> Hi Shankar, > >> Will it be possible for you to change the pgbench code to dynamically > fire > >> on all available coordinators? > >> > >> Since we use modified DBT-1 for our benchmarking, we haven't got to the > >> point where we can modify pg_bench to suite XC. But that's something, we > >> will welcome if anybody is interested. > >> > >> > >> On Mon, Jul 9, 2012 at 9:41 PM, Shankar Hariharan > >> <har...@ya...> wrote: > >>> > >>> Thanks Ashutosh. You are right, while running this test i just had > pgbench > >>> running against one coordinator. Looks like pgbench by itself may not > be an > >>> apt tool for this kind of testing, I will instead run pgbench's > underlying > >>> sql script from cmdline against either coordinators. Thanks for > that tip. > >>> > >>> I got a lot of input on my problem from a lot of folks on the list, the > >>> feedback is much appreciated. Thanks everybody! > >>> > >>> On max_prepared_transactions, I will factor in the number of > coordinators > >>> and the max_connections on each coordinator while arriving at a figure. > >>> Will also try out Koichi Suzuki's suggestion to have multiple NICs on > the > >>> GTM. I will post my findings here for the same cluster configuration > as > >>> before. > >>> > >>> thanks, > >>> Shankar > >>> > >>> ________________________________ > >>> From: Ashutosh Bapat <ash...@en...> > >>> To: Shankar Hariharan <har...@ya...> > >>> Cc: "pos...@li..." > >>> <pos...@li...> > >>> Sent: Sunday, July 8, 2012 11:02 PM > >>> > >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > >>> > >>> Hi Shankar, > >>> You have got answers to the prepared transaction problem, I guess. I > have > >>> something else below. > >>> > >>> On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan > >>> <har...@ya...> wrote: > >>> > >>> As planned I ran some tests using PGBench on this setup : > >>> > >>> Node 1 - Coord1, Datanode1, gtm-proxy1 > >>> Node 2- Coord2, Datanode2, gtm-proxy2 > >>> Node 3- Datanode3, gtm > >>> > >>> I was connecting via Coord1 for these tests: > >>> - scale factor of 30 used > >>> - tests run using the following input parameters for pgbench: > >>> > >>> > >>> Try connecting to both the coordinators, it should give you better > >>> performance, esp, when you are using distributed tables. With > distributed > >>> tables, coordinator gets involved in query execution more than that in > the > >>> case of replicated tables. So, balancing load across two coordinators > would > >>> help. > >>> > >>> > >>> > >>> Clients Threads Duration Transactions > >>> 1 1 100 6204 > >>> 2 2 100 9960 > >>> 4 4 100 12880 > >>> 6 6 100 1676 > >>> > >>> > >>> > >>> 8 > >>> 8 8 100 19758 > >>> 10 10 100 21944 > >>> 12 12 100 20674 > >>> > >>> The run went well until the 8 clients. I started seeing errors on 10 > >>> clients onwards and eventually the 14 client run has been hanging > around for > >>> over an hour now. The errors I have been seeing on console are the > following > >>> : > >>> > >>> pgbench console : > >>> Client 8 aborted in state 12: ERROR: GTM error, could not obtain > snapshot > >>> Client 0 aborted in state 13: ERROR: maximum number of prepared > >>> transactions reached > >>> Client 7 aborted in state 13: ERROR: maximum number of prepared > >>> transactions reached > >>> Client 11 aborted in state 13: ERROR: maximum number of prepared > >>> transactions reached > >>> Client 9 aborted in state 13: ERROR: maximum number of prepared > >>> transactions reached > >>> > >>> node console: > >>> ERROR: GTM error, could not obtain snapshot > >>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) > >>> VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); > >>> ERROR: maximum number of prepared transactions reached > >>> HINT: Increase max_prepared_transactions (currently 10). > >>> STATEMENT: PREPARE TRANSACTION 'T201428' > >>> ERROR: maximum number of prepared transactions reached > >>> STATEMENT: END; > >>> ERROR: maximum number of prepared transactions reached > >>> STATEMENT: END; > >>> ERROR: maximum number of prepared transactions reached > >>> STATEMENT: END; > >>> ERROR: maximum number of prepared transactions reached > >>> STATEMENT: END; > >>> ERROR: GTM error, could not obtain snapshot > >>> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) > >>> VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); > >>> > >>> I was also watching the processes on each node and see the following > for > >>> the 14 client run: > >>> > >>> > >>> Node1 : > >>> postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres > >>> postgres ::1(33481) TRUNCATE TABLE waiting > >>> postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres > >>> postgres pgbench-address (50388) TRUNCATE TABLE > >>> > >>> Node2: > >>> postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres > >>> postgres coord1-address(57357) idle in transaction > >>> > >>> Node3: > >>> postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres > >>> postgres coord1-address(51406) TRUNCATE TABLE waiting > >>> > >>> > >>> I was going to restart the processes on all nodes and start over but > did > >>> not want to lose this data as it could be useful information. > >>> > >>> Any explanation on the above issue is much appreciated. I will try the > >>> next run with a higher value set for max_prepared_transactions. Any > >>> recommendations for a good value on this front? > >>> > >>> thanks, > >>> Shankar > >>> > >>> > >>> ________________________________ > >>> From: Shankar Hariharan <har...@ya...> > >>> To: Ashutosh Bapat <ash...@en...> > >>> Cc: "pos...@li..." > >>> <pos...@li...> > >>> Sent: Friday, July 6, 2012 8:22 AM > >>> > >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > >>> > >>> Hi Ashutosh, > >>> I was trying to size the load on a server and was wondering if a GTM > >>> could be shared w/o much performance overhead between a small number of > >>> datanodes and coordinators. I will post my findings here. > >>> thanks, > >>> Shankar > >>> > >>> ________________________________ > >>> From: Ashutosh Bapat <ash...@en...> > >>> To: Shankar Hariharan <har...@ya...> > >>> Cc: "pos...@li..." > >>> <pos...@li...> > >>> Sent: Friday, July 6, 2012 12:25 AM > >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > >>> > >>> Hi Shankar, > >>> Running gtm-proxy has shown to improve the performance, because it > lessens > >>> the load on GTM, by serving requests locally. Why do you want the > >>> coordinators to connect directly to the GTM? Are you seeing any > performance > >>> improvement from doing that? > >>> > >>> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan > >>> <har...@ya...> wrote: > >>> > >>> Follow up to earlier email. In the setup described below, can I avoid > >>> using a gtm-proxy? That is, can I just simply point coordinators to > the one > >>> gtm running on node 3 ? > >>> My initial plan was to just run the gtm on node 3 then I thought I > could > >>> try a datanode without a local coordinator which was why I put these > two > >>> together on node 3. > >>> thanks, > >>> Shankar > >>> > >>> ________________________________ > >>> From: Shankar Hariharan <har...@ya...> > >>> To: "pos...@li..." > >>> <pos...@li...> > >>> Sent: Thursday, July 5, 2012 11:35 PM > >>> Subject: Question on multiple coordinators > >>> > >>> Hello, > >>> > >>> Am trying out XC 1.0 in the following configuraiton. > >>> Node 1 - Coord1, Datanode1, gtm-proxy1 > >>> Node 2- Coord2, Datanode2, gtm-proxy2 > >>> Node 3- Datanode3, gtm > >>> > >>> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In > >>> addition I missed the pg_hba edit as well. So the first table T1 that I > >>> created for distribution from Coord1 was not "visible| from Coord2 but > was > >>> on all the data nodes. > >>> I tried to get Coord2 backinto business in various ways but the first > >>> table I created refused to show up on Coord2 : > >>> - edit pg_hba and add node on both coord1 and 2. Then run select > >>> pgxc_pool_reload(); > >>> - restart coord 1 and 2 > >>> - drop node c2 from c1 and c1 from c2 and add them back followed by > select > >>> pgxc_pool_reload(); > >>> > >>> So I tried to create the same table T1 from Coord2 to observe behavior > and > >>> it did not like it clearly as all nodes it "wrote" to reported that the > >>> table already existed which was good. At this point I could understand > that > >>> Coord2 and Coord1 are not talking alright so I created a new table from > >>> coord1 with replication. This table was visible from both now. > >>> > >>> Question is should I expect to see the first table, let me call it T1 > >>> after a while from Coord2 also? > >>> > >>> > >>> thanks, > >>> Shankar > >>> > >>> > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> Live Security Virtual Conference > >>> Exclusive live event will cover all the ways today's security and > >>> threat landscape has changed and how IT managers can respond. > Discussions > >>> will include endpoint security, mobile security and the latest in > malware > >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >>> _______________________________________________ > >>> Postgres-xc-developers mailing list > >>> Pos...@li... > >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >>> > >>> > >>> > >>> > >>> -- > >>> Best Wishes, > >>> Ashutosh Bapat > >>> EntepriseDB Corporation > >>> The Enterprise Postgres Company > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> Best Wishes, > >>> Ashutosh Bapat > >>> EntepriseDB Corporation > >>> The Enterprise Postgres Company > >>> > >>> > >>> > >> > >> > >> > >> -- > >> Best Wishes, > >> Ashutosh Bapat > >> EntepriseDB Corporation > >> The Enterprise Postgres Company > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Postgres-xc-developers mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2012-07-10 12:06:04
|
Ok. On Tue, Jul 10, 2012 at 5:32 PM, Michael Paquier <mic...@gm...>wrote: > > > On Tue, Jul 10, 2012 at 9:01 PM, Michael Paquier < > mic...@gm...> wrote: > >> >> >> On Tue, Jul 10, 2012 at 8:56 PM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> Hi Michael, >>> >>> The patch looks good. Here is one comment I have - >>> Is there a real need to add rule OptDistributeByInternal? The token >>> OptDistributeByInternal is being used only once. Same is the case with >>> OptSubClusterInternal. >>> >> Thanks. >> Defining OptDistributeByInternal avoids conflicts bison shift-reduce >> conflicts with empty fields if we plug in that extension somewhere else >> with other queries. >> > Sorry I just meant "shift-reduce conflicts with bison". It typed too > quickly here. :) > >> Hence, it is used as an extension to the list of available commands of >> ALTER TABLE in its new grammar. >> >> >> >>> >>> >>> On Tue, Jul 10, 2012 at 8:27 AM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> Hi Ashutosh, >>>> >>>> Please find attached the wanted patches: >>>> 1) 20120710_grammar.patch, refactoring the grammar >>>> 2) 20120710_refactor.patch, refactoring CREATE TABLE code for functions >>>> related to distribution >>>> 3) 20120710_remotecopy.patch, refactoring the COPY code into remote >>>> COPY. >>>> In order to simplify my maintenance work and yours, I think you should >>>> have a look at those patches before looking at the redistribution work. >>>> Those patches are really simple, have no whitespace, no warnings, are >>>> independant to each other, and each of them is essential for the >>>> redistribution algorithm. >>>> As they are really simple, please let's accelerate the review of those >>>> 3 ones, commit them and move to the heart of the discussions. >>>> >>>> Thanks in advance. >>>> >>>> >>>> On Mon, Jul 9, 2012 at 9:22 PM, Ashutosh Bapat < >>>> ash...@en...> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier < >>>>> mic...@gm...> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < >>>>>> ash...@en...> wrote: >>>>>> >>>>>>> Hi Michael, >>>>>>> I had a look at the patch. I mainly focused on the overall content >>>>>>> of the patch and importantly tests. Before I look at the redistribution >>>>>>> code thoroughly, I have few comments. >>>>>>> >>>>>>> There are many trailing white spaces in the patch. Please fix those, >>>>>>> they unnecessarily fail the automatic merges sometimes. You can do that >>>>>>> when you commit the patch. >>>>>>> >>>>>> Oh OK, I didn't notice. Do you have some places particularly in mind? >>>>>> >>>>> >>>>> Apply your patch on clean repository using git apply and it will show >>>>> you. >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Code >>>>>>> ==== >>>>>>> 1. There is a lot of code, which is refactoring existing code, >>>>>>> renaming functions, which is not necessarily related to redistribution >>>>>>> work. Can you please provide separate patches for this refactoring? We >>>>>>> should commit them separately. For example build_subcluster_data() has been >>>>>>> renamed (for good may be), but it makes sense if we do it separately. >>>>>>> Someone looking at the ALTER TABLE commit should not get overwhelmed by the >>>>>>> extraneous changes. >>>>>>> >>>>>> OK. The problem with the functions currently on master was that there >>>>>> name was not really generic and sometimes did not reflect their real >>>>>> functionality. So as now the plan is to use them in a more general way, I >>>>>> think there name is not going to change anymore. >>>>>> >>>>>> >>>>>>> >>>>>>> 2. Same is the case with the grammar changes. Please separate the >>>>>>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>>>>>> it's because of ALTER TABLE you need to do those changes. >>>>>>> >>>>>> OK understood. >>>>>> >>>>>> >>>>>>> >>>>>>> Please get these patches reviewed as well, since I haven't looked at >>>>>>> the changes proper. >>>>>>> >>>>>> Understood, I'll make those 2 patches on tomorrow morning, not a big >>>>>> deal. >>>>>> >>>>>> >>>>>>> >>>>>>> Tests >>>>>>> ===== >>>>>>> 1. There is no need to test with huge data, that slows down >>>>>>> regression. For performance testing, you can create a separate test (not to >>>>>>> be included in regression), if you want. >>>>>>> >>>>>> That may be an idea. However you are right I'll limit the number of >>>>>> rows tested. >>>>>> >>>>>> >>>>>> >>>>>>> 2. We need tests, which will test the plan cache (in)validation upon >>>>>>> redistribution of data, tests for testing existing views working after the >>>>>>> redistribution. Please take a look at the PG alter table test for more such >>>>>>> scenarios. >>>>>> >>>>>> OK I'll add those scenarios. They will be included in xc_alter_table. >>>>>> >>>>>> >>>>>>> If you happen to add some performance tests, it would be also good >>>>>>> to test the sanity of concurrent transactions accessing the object/s being >>>>>>> redistributed. It's vital considering that such redistribution would run >>>>>>> for longer. >>>>>>> >>>>>> Yes, it would be nice to >>>>>> >>>>>> >>>>>> >>>>>>> 3. Instead of relying on count(*) to show sanity of the >>>>>>> redistributed data, you may use better aggregates like array_agg or sum(), >>>>>>> avg() and count(). I would prefer array_agg over others, since you can list >>>>>>> all the data values there. You will need aggregate's order by clause (Not >>>>>>> that of the SELECT). >>>>>>> 4. In the case of redistribution of table with index, you will need >>>>>>> to check the sanity of index after the redistribution by some means. >>>>>>> >>>>>> Do you have an idea of how to do that? Pick up some tests from >>>>>> postgres? >>>>>> >>>>> >>>>> Good question. But I don't have an answer (specifically for XC, since >>>>> the indexes are on datanodes). >>>>> >>>>> >>>>>> >>>>>> >>>>>>> 5. I did not understand the significance of the tests where you add >>>>>>> and drop column and redistribute the data. The SELECT after the >>>>>>> redistribution is not testing anything specific for the added/dropped >>>>>>> column. >>>>>>> >>>>>> The internal, let's say default layer, of distribution mechanism uses >>>>>> an internal COPY and it is important to do this check and correctly bypass >>>>>> the columns that are dropped. The SELECT is just here to check that data >>>>>> has been redistributed correctly. >>>>>> >>>>>> >>>>>> >>>>>>> 6. There are no testcases which would change the distribution type >>>>>>> and node list at the same time. Please add those. (I am assuming that these >>>>>>> two operations are possible together). >>>>>>> >>>>>> Yeah sorry, I have been working on that today and added some >>>>>> additional tests that can do that. >>>>>> They are in the bucket, just I didn't send the absolutely latest >>>>>> version. >>>>>> >>>>>> >>>>>>> 7. Negative testcases need to improved. >>>>>>> >>>>>> What are the negative test cases? It would be cool if you could >>>>>> precise. >>>>>> >>>>> >>>>> Tests which do negative testing ( >>>>> http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Additional feature >>>>>>> ================== >>>>>>> It will be helpful to add the distribution information in the output >>>>>>> of \d command for tables. It will be good tool for tests to check whether >>>>>>> the catalogs have been updated correctly or not. Please add this feature >>>>>>> before we complete ALTER TABLE. It shouldn't take much time. Please provide >>>>>>> this as a separate patch. >>>>>>> >>>>>> +1. >>>>>> This is a good idea, and I recall we had this discussion a couple of >>>>>> months ago. However it is not directly related with redistribution. So it >>>>>> should be provided after committing the redistribution work I believe. >>>>>> >>>>> >>>>> It will help in testing the feature. For example, you can just do \d >>>>> on the redistributed table, to see if catalogs have been updated correctly >>>>> or not. So, it's better to do it before this ALTER TABLE, so that you can >>>>> use it in the tests. It should been done when the work related to the >>>>> subcluster was done, even before when XC was started :). Anyway, earlier >>>>> the better. >>>>> >>>>> >>>>>> Also, I think we shouldn't use ¥d as it will impact other >>>>>> applications like pgadmin for instance. We should use an extension of ¥d >>>>>> like for example ¥dZ. This is just a suggestion, I don't know what are the >>>>>> commands still not in use. >>>>>> >>>>> >>>>> \d is for describing a relation at bare minimum. In XC distribution >>>>> strategy becomes an integral part of a relation, and thus should be part of >>>>> the \d output. Applications using \d will need a change, but how many >>>>> applications connect via psql to fire commands (very less, I guess), so we >>>>> are not in much trouble. If one compares changing grammar of say CREATE >>>>> TABLE after the first release, would be more problematic that this one. >>>>> >>>>> >>>>>> -- >>>>>> Michael Paquier >>>>>> http://michael.otacoo.com >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Wishes, >>>>> Ashutosh Bapat >>>>> EntepriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> >>>>> >>>> >>>> >>>> -- >>>> Michael Paquier >>>> http://michael.otacoo.com >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >> >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > Michael Paquier > http://michael.otacoo.com > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-10 12:02:44
|
On Tue, Jul 10, 2012 at 9:01 PM, Michael Paquier <mic...@gm...>wrote: > > > On Tue, Jul 10, 2012 at 8:56 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Hi Michael, >> >> The patch looks good. Here is one comment I have - >> Is there a real need to add rule OptDistributeByInternal? The token >> OptDistributeByInternal is being used only once. Same is the case with >> OptSubClusterInternal. >> > Thanks. > Defining OptDistributeByInternal avoids conflicts bison shift-reduce > conflicts with empty fields if we plug in that extension somewhere else > with other queries. > Sorry I just meant "shift-reduce conflicts with bison". It typed too quickly here. :) > Hence, it is used as an extension to the list of available commands of > ALTER TABLE in its new grammar. > > > >> >> >> On Tue, Jul 10, 2012 at 8:27 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi Ashutosh, >>> >>> Please find attached the wanted patches: >>> 1) 20120710_grammar.patch, refactoring the grammar >>> 2) 20120710_refactor.patch, refactoring CREATE TABLE code for functions >>> related to distribution >>> 3) 20120710_remotecopy.patch, refactoring the COPY code into remote COPY. >>> In order to simplify my maintenance work and yours, I think you should >>> have a look at those patches before looking at the redistribution work. >>> Those patches are really simple, have no whitespace, no warnings, are >>> independant to each other, and each of them is essential for the >>> redistribution algorithm. >>> As they are really simple, please let's accelerate the review of those 3 >>> ones, commit them and move to the heart of the discussions. >>> >>> Thanks in advance. >>> >>> >>> On Mon, Jul 9, 2012 at 9:22 PM, Ashutosh Bapat < >>> ash...@en...> wrote: >>> >>>> >>>> >>>> On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier < >>>> mic...@gm...> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < >>>>> ash...@en...> wrote: >>>>> >>>>>> Hi Michael, >>>>>> I had a look at the patch. I mainly focused on the overall content of >>>>>> the patch and importantly tests. Before I look at the redistribution code >>>>>> thoroughly, I have few comments. >>>>>> >>>>>> There are many trailing white spaces in the patch. Please fix those, >>>>>> they unnecessarily fail the automatic merges sometimes. You can do that >>>>>> when you commit the patch. >>>>>> >>>>> Oh OK, I didn't notice. Do you have some places particularly in mind? >>>>> >>>> >>>> Apply your patch on clean repository using git apply and it will show >>>> you. >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> Code >>>>>> ==== >>>>>> 1. There is a lot of code, which is refactoring existing code, >>>>>> renaming functions, which is not necessarily related to redistribution >>>>>> work. Can you please provide separate patches for this refactoring? We >>>>>> should commit them separately. For example build_subcluster_data() has been >>>>>> renamed (for good may be), but it makes sense if we do it separately. >>>>>> Someone looking at the ALTER TABLE commit should not get overwhelmed by the >>>>>> extraneous changes. >>>>>> >>>>> OK. The problem with the functions currently on master was that there >>>>> name was not really generic and sometimes did not reflect their real >>>>> functionality. So as now the plan is to use them in a more general way, I >>>>> think there name is not going to change anymore. >>>>> >>>>> >>>>>> >>>>>> 2. Same is the case with the grammar changes. Please separate the >>>>>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>>>>> it's because of ALTER TABLE you need to do those changes. >>>>>> >>>>> OK understood. >>>>> >>>>> >>>>>> >>>>>> Please get these patches reviewed as well, since I haven't looked at >>>>>> the changes proper. >>>>>> >>>>> Understood, I'll make those 2 patches on tomorrow morning, not a big >>>>> deal. >>>>> >>>>> >>>>>> >>>>>> Tests >>>>>> ===== >>>>>> 1. There is no need to test with huge data, that slows down >>>>>> regression. For performance testing, you can create a separate test (not to >>>>>> be included in regression), if you want. >>>>>> >>>>> That may be an idea. However you are right I'll limit the number of >>>>> rows tested. >>>>> >>>>> >>>>> >>>>>> 2. We need tests, which will test the plan cache (in)validation upon >>>>>> redistribution of data, tests for testing existing views working after the >>>>>> redistribution. Please take a look at the PG alter table test for more such >>>>>> scenarios. >>>>> >>>>> OK I'll add those scenarios. They will be included in xc_alter_table. >>>>> >>>>> >>>>>> If you happen to add some performance tests, it would be also good to >>>>>> test the sanity of concurrent transactions accessing the object/s being >>>>>> redistributed. It's vital considering that such redistribution would run >>>>>> for longer. >>>>>> >>>>> Yes, it would be nice to >>>>> >>>>> >>>>> >>>>>> 3. Instead of relying on count(*) to show sanity of the >>>>>> redistributed data, you may use better aggregates like array_agg or sum(), >>>>>> avg() and count(). I would prefer array_agg over others, since you can list >>>>>> all the data values there. You will need aggregate's order by clause (Not >>>>>> that of the SELECT). >>>>>> 4. In the case of redistribution of table with index, you will need >>>>>> to check the sanity of index after the redistribution by some means. >>>>>> >>>>> Do you have an idea of how to do that? Pick up some tests from >>>>> postgres? >>>>> >>>> >>>> Good question. But I don't have an answer (specifically for XC, since >>>> the indexes are on datanodes). >>>> >>>> >>>>> >>>>> >>>>>> 5. I did not understand the significance of the tests where you add >>>>>> and drop column and redistribute the data. The SELECT after the >>>>>> redistribution is not testing anything specific for the added/dropped >>>>>> column. >>>>>> >>>>> The internal, let's say default layer, of distribution mechanism uses >>>>> an internal COPY and it is important to do this check and correctly bypass >>>>> the columns that are dropped. The SELECT is just here to check that data >>>>> has been redistributed correctly. >>>>> >>>>> >>>>> >>>>>> 6. There are no testcases which would change the distribution type >>>>>> and node list at the same time. Please add those. (I am assuming that these >>>>>> two operations are possible together). >>>>>> >>>>> Yeah sorry, I have been working on that today and added some >>>>> additional tests that can do that. >>>>> They are in the bucket, just I didn't send the absolutely latest >>>>> version. >>>>> >>>>> >>>>>> 7. Negative testcases need to improved. >>>>>> >>>>> What are the negative test cases? It would be cool if you could >>>>> precise. >>>>> >>>> >>>> Tests which do negative testing ( >>>> http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> Additional feature >>>>>> ================== >>>>>> It will be helpful to add the distribution information in the output >>>>>> of \d command for tables. It will be good tool for tests to check whether >>>>>> the catalogs have been updated correctly or not. Please add this feature >>>>>> before we complete ALTER TABLE. It shouldn't take much time. Please provide >>>>>> this as a separate patch. >>>>>> >>>>> +1. >>>>> This is a good idea, and I recall we had this discussion a couple of >>>>> months ago. However it is not directly related with redistribution. So it >>>>> should be provided after committing the redistribution work I believe. >>>>> >>>> >>>> It will help in testing the feature. For example, you can just do \d on >>>> the redistributed table, to see if catalogs have been updated correctly or >>>> not. So, it's better to do it before this ALTER TABLE, so that you can use >>>> it in the tests. It should been done when the work related to the >>>> subcluster was done, even before when XC was started :). Anyway, earlier >>>> the better. >>>> >>>> >>>>> Also, I think we shouldn't use ¥d as it will impact other applications >>>>> like pgadmin for instance. We should use an extension of ¥d like for >>>>> example ¥dZ. This is just a suggestion, I don't know what are the commands >>>>> still not in use. >>>>> >>>> >>>> \d is for describing a relation at bare minimum. In XC distribution >>>> strategy becomes an integral part of a relation, and thus should be part of >>>> the \d output. Applications using \d will need a change, but how many >>>> applications connect via psql to fire commands (very less, I guess), so we >>>> are not in much trouble. If one compares changing grammar of say CREATE >>>> TABLE after the first release, would be more problematic that this one. >>>> >>>> >>>>> -- >>>>> Michael Paquier >>>>> http://michael.otacoo.com >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>> >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> > > > -- > Michael Paquier > http://michael.otacoo.com > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-10 12:01:38
|
On Tue, Jul 10, 2012 at 8:56 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Michael, > The patch looks good. Here is one comment I have - > Is there a real need to add rule OptDistributeByInternal? The token > OptDistributeByInternal is being used only once. Same is the case with > OptSubClusterInternal. Thanks. Defining OptDistributeByInternal avoids conflicts bison shift-reduce conflicts with empty fields if we plug in that extension somewhere else with other queries. Hence, it is used as an extension to the list of available commands of ALTER TABLE in its new grammar. > > > On Tue, Jul 10, 2012 at 8:27 AM, Michael Paquier < > mic...@gm...> wrote: > >> Hi Ashutosh, >> >> Please find attached the wanted patches: >> 1) 20120710_grammar.patch, refactoring the grammar >> 2) 20120710_refactor.patch, refactoring CREATE TABLE code for functions >> related to distribution >> 3) 20120710_remotecopy.patch, refactoring the COPY code into remote COPY. >> In order to simplify my maintenance work and yours, I think you should >> have a look at those patches before looking at the redistribution work. >> Those patches are really simple, have no whitespace, no warnings, are >> independant to each other, and each of them is essential for the >> redistribution algorithm. >> As they are really simple, please let's accelerate the review of those 3 >> ones, commit them and move to the heart of the discussions. >> >> Thanks in advance. >> >> >> On Mon, Jul 9, 2012 at 9:22 PM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> >>> >>> On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> >>>> >>>> On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < >>>> ash...@en...> wrote: >>>> >>>>> Hi Michael, >>>>> I had a look at the patch. I mainly focused on the overall content of >>>>> the patch and importantly tests. Before I look at the redistribution code >>>>> thoroughly, I have few comments. >>>>> >>>>> There are many trailing white spaces in the patch. Please fix those, >>>>> they unnecessarily fail the automatic merges sometimes. You can do that >>>>> when you commit the patch. >>>>> >>>> Oh OK, I didn't notice. Do you have some places particularly in mind? >>>> >>> >>> Apply your patch on clean repository using git apply and it will show >>> you. >>> >>> >>>> >>>> >>>>> >>>>> Code >>>>> ==== >>>>> 1. There is a lot of code, which is refactoring existing code, >>>>> renaming functions, which is not necessarily related to redistribution >>>>> work. Can you please provide separate patches for this refactoring? We >>>>> should commit them separately. For example build_subcluster_data() has been >>>>> renamed (for good may be), but it makes sense if we do it separately. >>>>> Someone looking at the ALTER TABLE commit should not get overwhelmed by the >>>>> extraneous changes. >>>>> >>>> OK. The problem with the functions currently on master was that there >>>> name was not really generic and sometimes did not reflect their real >>>> functionality. So as now the plan is to use them in a more general way, I >>>> think there name is not going to change anymore. >>>> >>>> >>>>> >>>>> 2. Same is the case with the grammar changes. Please separate the >>>>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>>>> it's because of ALTER TABLE you need to do those changes. >>>>> >>>> OK understood. >>>> >>>> >>>>> >>>>> Please get these patches reviewed as well, since I haven't looked at >>>>> the changes proper. >>>>> >>>> Understood, I'll make those 2 patches on tomorrow morning, not a big >>>> deal. >>>> >>>> >>>>> >>>>> Tests >>>>> ===== >>>>> 1. There is no need to test with huge data, that slows down >>>>> regression. For performance testing, you can create a separate test (not to >>>>> be included in regression), if you want. >>>>> >>>> That may be an idea. However you are right I'll limit the number of >>>> rows tested. >>>> >>>> >>>> >>>>> 2. We need tests, which will test the plan cache (in)validation upon >>>>> redistribution of data, tests for testing existing views working after the >>>>> redistribution. Please take a look at the PG alter table test for more such >>>>> scenarios. >>>> >>>> OK I'll add those scenarios. They will be included in xc_alter_table. >>>> >>>> >>>>> If you happen to add some performance tests, it would be also good to >>>>> test the sanity of concurrent transactions accessing the object/s being >>>>> redistributed. It's vital considering that such redistribution would run >>>>> for longer. >>>>> >>>> Yes, it would be nice to >>>> >>>> >>>> >>>>> 3. Instead of relying on count(*) to show sanity of the redistributed >>>>> data, you may use better aggregates like array_agg or sum(), avg() and >>>>> count(). I would prefer array_agg over others, since you can list all the >>>>> data values there. You will need aggregate's order by clause (Not that of >>>>> the SELECT). >>>>> 4. In the case of redistribution of table with index, you will need to >>>>> check the sanity of index after the redistribution by some means. >>>>> >>>> Do you have an idea of how to do that? Pick up some tests from postgres? >>>> >>> >>> Good question. But I don't have an answer (specifically for XC, since >>> the indexes are on datanodes). >>> >>> >>>> >>>> >>>>> 5. I did not understand the significance of the tests where you add >>>>> and drop column and redistribute the data. The SELECT after the >>>>> redistribution is not testing anything specific for the added/dropped >>>>> column. >>>>> >>>> The internal, let's say default layer, of distribution mechanism uses >>>> an internal COPY and it is important to do this check and correctly bypass >>>> the columns that are dropped. The SELECT is just here to check that data >>>> has been redistributed correctly. >>>> >>>> >>>> >>>>> 6. There are no testcases which would change the distribution type and >>>>> node list at the same time. Please add those. (I am assuming that these two >>>>> operations are possible together). >>>>> >>>> Yeah sorry, I have been working on that today and added some additional >>>> tests that can do that. >>>> They are in the bucket, just I didn't send the absolutely latest >>>> version. >>>> >>>> >>>>> 7. Negative testcases need to improved. >>>>> >>>> What are the negative test cases? It would be cool if you could precise. >>>> >>> >>> Tests which do negative testing ( >>> http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) >>> >>> >>>> >>>> >>>>> >>>>> Additional feature >>>>> ================== >>>>> It will be helpful to add the distribution information in the output >>>>> of \d command for tables. It will be good tool for tests to check whether >>>>> the catalogs have been updated correctly or not. Please add this feature >>>>> before we complete ALTER TABLE. It shouldn't take much time. Please provide >>>>> this as a separate patch. >>>>> >>>> +1. >>>> This is a good idea, and I recall we had this discussion a couple of >>>> months ago. However it is not directly related with redistribution. So it >>>> should be provided after committing the redistribution work I believe. >>>> >>> >>> It will help in testing the feature. For example, you can just do \d on >>> the redistributed table, to see if catalogs have been updated correctly or >>> not. So, it's better to do it before this ALTER TABLE, so that you can use >>> it in the tests. It should been done when the work related to the >>> subcluster was done, even before when XC was started :). Anyway, earlier >>> the better. >>> >>> >>>> Also, I think we shouldn't use ¥d as it will impact other applications >>>> like pgadmin for instance. We should use an extension of ¥d like for >>>> example ¥dZ. This is just a suggestion, I don't know what are the commands >>>> still not in use. >>>> >>> >>> \d is for describing a relation at bare minimum. In XC distribution >>> strategy becomes an integral part of a relation, and thus should be part of >>> the \d output. Applications using \d will need a change, but how many >>> applications connect via psql to fire commands (very less, I guess), so we >>> are not in much trouble. If one compares changing grammar of say CREATE >>> TABLE after the first release, would be more problematic that this one. >>> >>> >>>> -- >>>> Michael Paquier >>>> http://michael.otacoo.com >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >> >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2012-07-10 11:56:52
|
Hi Michael, The patch looks good. Here is one comment I have - Is there a real need to add rule OptDistributeByInternal? The token OptDistributeByInternal is being used only once. Same is the case with OptSubClusterInternal. On Tue, Jul 10, 2012 at 8:27 AM, Michael Paquier <mic...@gm...>wrote: > Hi Ashutosh, > > Please find attached the wanted patches: > 1) 20120710_grammar.patch, refactoring the grammar > 2) 20120710_refactor.patch, refactoring CREATE TABLE code for functions > related to distribution > 3) 20120710_remotecopy.patch, refactoring the COPY code into remote COPY. > In order to simplify my maintenance work and yours, I think you should > have a look at those patches before looking at the redistribution work. > Those patches are really simple, have no whitespace, no warnings, are > independant to each other, and each of them is essential for the > redistribution algorithm. > As they are really simple, please let's accelerate the review of those 3 > ones, commit them and move to the heart of the discussions. > > Thanks in advance. > > > On Mon, Jul 9, 2012 at 9:22 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> >> >> On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier < >> mic...@gm...> wrote: >> >>> >>> >>> On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < >>> ash...@en...> wrote: >>> >>>> Hi Michael, >>>> I had a look at the patch. I mainly focused on the overall content of >>>> the patch and importantly tests. Before I look at the redistribution code >>>> thoroughly, I have few comments. >>>> >>>> There are many trailing white spaces in the patch. Please fix those, >>>> they unnecessarily fail the automatic merges sometimes. You can do that >>>> when you commit the patch. >>>> >>> Oh OK, I didn't notice. Do you have some places particularly in mind? >>> >> >> Apply your patch on clean repository using git apply and it will show you. >> >> >>> >>> >>>> >>>> Code >>>> ==== >>>> 1. There is a lot of code, which is refactoring existing code, renaming >>>> functions, which is not necessarily related to redistribution work. Can you >>>> please provide separate patches for this refactoring? We should commit them >>>> separately. For example build_subcluster_data() has been renamed (for good >>>> may be), but it makes sense if we do it separately. Someone looking at the >>>> ALTER TABLE commit should not get overwhelmed by the extraneous changes. >>>> >>> OK. The problem with the functions currently on master was that there >>> name was not really generic and sometimes did not reflect their real >>> functionality. So as now the plan is to use them in a more general way, I >>> think there name is not going to change anymore. >>> >>> >>>> >>>> 2. Same is the case with the grammar changes. Please separate the >>>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>>> it's because of ALTER TABLE you need to do those changes. >>>> >>> OK understood. >>> >>> >>>> >>>> Please get these patches reviewed as well, since I haven't looked at >>>> the changes proper. >>>> >>> Understood, I'll make those 2 patches on tomorrow morning, not a big >>> deal. >>> >>> >>>> >>>> Tests >>>> ===== >>>> 1. There is no need to test with huge data, that slows down regression. >>>> For performance testing, you can create a separate test (not to be included >>>> in regression), if you want. >>>> >>> That may be an idea. However you are right I'll limit the number of >>> rows tested. >>> >>> >>> >>>> 2. We need tests, which will test the plan cache (in)validation upon >>>> redistribution of data, tests for testing existing views working after the >>>> redistribution. Please take a look at the PG alter table test for more such >>>> scenarios. >>> >>> OK I'll add those scenarios. They will be included in xc_alter_table. >>> >>> >>>> If you happen to add some performance tests, it would be also good to >>>> test the sanity of concurrent transactions accessing the object/s being >>>> redistributed. It's vital considering that such redistribution would run >>>> for longer. >>>> >>> Yes, it would be nice to >>> >>> >>> >>>> 3. Instead of relying on count(*) to show sanity of the redistributed >>>> data, you may use better aggregates like array_agg or sum(), avg() and >>>> count(). I would prefer array_agg over others, since you can list all the >>>> data values there. You will need aggregate's order by clause (Not that of >>>> the SELECT). >>>> 4. In the case of redistribution of table with index, you will need to >>>> check the sanity of index after the redistribution by some means. >>>> >>> Do you have an idea of how to do that? Pick up some tests from postgres? >>> >> >> Good question. But I don't have an answer (specifically for XC, since the >> indexes are on datanodes). >> >> >>> >>> >>>> 5. I did not understand the significance of the tests where you add and >>>> drop column and redistribute the data. The SELECT after the redistribution >>>> is not testing anything specific for the added/dropped column. >>>> >>> The internal, let's say default layer, of distribution mechanism uses an >>> internal COPY and it is important to do this check and correctly bypass the >>> columns that are dropped. The SELECT is just here to check that data has >>> been redistributed correctly. >>> >>> >>> >>>> 6. There are no testcases which would change the distribution type and >>>> node list at the same time. Please add those. (I am assuming that these two >>>> operations are possible together). >>>> >>> Yeah sorry, I have been working on that today and added some additional >>> tests that can do that. >>> They are in the bucket, just I didn't send the absolutely latest version. >>> >>> >>>> 7. Negative testcases need to improved. >>>> >>> What are the negative test cases? It would be cool if you could precise. >>> >> >> Tests which do negative testing ( >> http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) >> >> >>> >>> >>>> >>>> Additional feature >>>> ================== >>>> It will be helpful to add the distribution information in the output of >>>> \d command for tables. It will be good tool for tests to check whether the >>>> catalogs have been updated correctly or not. Please add this feature before >>>> we complete ALTER TABLE. It shouldn't take much time. Please provide this >>>> as a separate patch. >>>> >>> +1. >>> This is a good idea, and I recall we had this discussion a couple of >>> months ago. However it is not directly related with redistribution. So it >>> should be provided after committing the redistribution work I believe. >>> >> >> It will help in testing the feature. For example, you can just do \d on >> the redistributed table, to see if catalogs have been updated correctly or >> not. So, it's better to do it before this ALTER TABLE, so that you can use >> it in the tests. It should been done when the work related to the >> subcluster was done, even before when XC was started :). Anyway, earlier >> the better. >> >> >>> Also, I think we shouldn't use ¥d as it will impact other applications >>> like pgadmin for instance. We should use an extension of ¥d like for >>> example ¥dZ. This is just a suggestion, I don't know what are the commands >>> still not in use. >>> >> >> \d is for describing a relation at bare minimum. In XC distribution >> strategy becomes an integral part of a relation, and thus should be part of >> the \d output. Applications using \d will need a change, but how many >> applications connect via psql to fire commands (very less, I guess), so we >> are not in much trouble. If one compares changing grammar of say CREATE >> TABLE after the first release, would be more problematic that this one. >> >> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> > > > -- > Michael Paquier > http://michael.otacoo.com > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-10 10:16:41
|
On Tue, Jul 10, 2012 at 7:14 PM, Ashutosh Bapat < ash...@en...> wrote: > > >> 8. Probably not your code but, Function BuildRelationDistributionNodes() >>> does a repalloc() for every new nodeoid it finds. Each repalloc is costly. >>> Instead we can allocate memory large enough to contain all members of the >>> list passed. If there are node repeated (which will be less likely), we >>> will waste a few bytes, but won't be as expensive as calling repalloc(). >>> >> So doing a huge palloc done once scalled with the number of Datanodes? >> > > Yes. The palloc is as large as the number of nodes specified (including > duplicates) and not the actual number of datanodes available. > > >> >> >>> 9. All the renamed functions are marked as "extern", do you really need >>> them so? Also, I don't understand why these functions are located in heap.c? >>> >> It is their historical place. Want to move them away if possible? >> >> >>> I hope regression is sane. >> >> *Regression IS sane*. I already checked for each patch. You should stop >> to worry. >> If you have that many comments also for the other 2 patches, which are >> only the base.... >> It might take a looooonng time. >> Please also consider this. >> > > I am fine with whatever time it takes as long as the end result is in good > shape. > Just to reformulate: we are not the only members of the core team. And this work needs to be finished in time. Thanks. > > >> Thanks. >> >> >> >>> >>> >>> On Tue, Jul 10, 2012 at 8:27 AM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> Hi Ashutosh, >>>> >>>> Please find attached the wanted patches: >>>> 1) 20120710_grammar.patch, refactoring the grammar >>>> 2) 20120710_refactor.patch, refactoring CREATE TABLE code for functions >>>> related to distribution >>>> 3) 20120710_remotecopy.patch, refactoring the COPY code into remote >>>> COPY. >>>> In order to simplify my maintenance work and yours, I think you should >>>> have a look at those patches before looking at the redistribution work. >>>> Those patches are really simple, have no whitespace, no warnings, are >>>> independant to each other, and each of them is essential for the >>>> redistribution algorithm. >>>> As they are really simple, please let's accelerate the review of those >>>> 3 ones, commit them and move to the heart of the discussions. >>>> >>>> Thanks in advance. >>>> >>>> >>>> On Mon, Jul 9, 2012 at 9:22 PM, Ashutosh Bapat < >>>> ash...@en...> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier < >>>>> mic...@gm...> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < >>>>>> ash...@en...> wrote: >>>>>> >>>>>>> Hi Michael, >>>>>>> I had a look at the patch. I mainly focused on the overall content >>>>>>> of the patch and importantly tests. Before I look at the redistribution >>>>>>> code thoroughly, I have few comments. >>>>>>> >>>>>>> There are many trailing white spaces in the patch. Please fix those, >>>>>>> they unnecessarily fail the automatic merges sometimes. You can do that >>>>>>> when you commit the patch. >>>>>>> >>>>>> Oh OK, I didn't notice. Do you have some places particularly in mind? >>>>>> >>>>> >>>>> Apply your patch on clean repository using git apply and it will show >>>>> you. >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Code >>>>>>> ==== >>>>>>> 1. There is a lot of code, which is refactoring existing code, >>>>>>> renaming functions, which is not necessarily related to redistribution >>>>>>> work. Can you please provide separate patches for this refactoring? We >>>>>>> should commit them separately. For example build_subcluster_data() has been >>>>>>> renamed (for good may be), but it makes sense if we do it separately. >>>>>>> Someone looking at the ALTER TABLE commit should not get overwhelmed by the >>>>>>> extraneous changes. >>>>>>> >>>>>> OK. The problem with the functions currently on master was that there >>>>>> name was not really generic and sometimes did not reflect their real >>>>>> functionality. So as now the plan is to use them in a more general way, I >>>>>> think there name is not going to change anymore. >>>>>> >>>>>> >>>>>>> >>>>>>> 2. Same is the case with the grammar changes. Please separate the >>>>>>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>>>>>> it's because of ALTER TABLE you need to do those changes. >>>>>>> >>>>>> OK understood. >>>>>> >>>>>> >>>>>>> >>>>>>> Please get these patches reviewed as well, since I haven't looked at >>>>>>> the changes proper. >>>>>>> >>>>>> Understood, I'll make those 2 patches on tomorrow morning, not a big >>>>>> deal. >>>>>> >>>>>> >>>>>>> >>>>>>> Tests >>>>>>> ===== >>>>>>> 1. There is no need to test with huge data, that slows down >>>>>>> regression. For performance testing, you can create a separate test (not to >>>>>>> be included in regression), if you want. >>>>>>> >>>>>> That may be an idea. However you are right I'll limit the number of >>>>>> rows tested. >>>>>> >>>>>> >>>>>> >>>>>>> 2. We need tests, which will test the plan cache (in)validation upon >>>>>>> redistribution of data, tests for testing existing views working after the >>>>>>> redistribution. Please take a look at the PG alter table test for more such >>>>>>> scenarios. >>>>>> >>>>>> OK I'll add those scenarios. They will be included in xc_alter_table. >>>>>> >>>>>> >>>>>>> If you happen to add some performance tests, it would be also good >>>>>>> to test the sanity of concurrent transactions accessing the object/s being >>>>>>> redistributed. It's vital considering that such redistribution would run >>>>>>> for longer. >>>>>>> >>>>>> Yes, it would be nice to >>>>>> >>>>>> >>>>>> >>>>>>> 3. Instead of relying on count(*) to show sanity of the >>>>>>> redistributed data, you may use better aggregates like array_agg or sum(), >>>>>>> avg() and count(). I would prefer array_agg over others, since you can list >>>>>>> all the data values there. You will need aggregate's order by clause (Not >>>>>>> that of the SELECT). >>>>>>> 4. In the case of redistribution of table with index, you will need >>>>>>> to check the sanity of index after the redistribution by some means. >>>>>>> >>>>>> Do you have an idea of how to do that? Pick up some tests from >>>>>> postgres? >>>>>> >>>>> >>>>> Good question. But I don't have an answer (specifically for XC, since >>>>> the indexes are on datanodes). >>>>> >>>>> >>>>>> >>>>>> >>>>>>> 5. I did not understand the significance of the tests where you add >>>>>>> and drop column and redistribute the data. The SELECT after the >>>>>>> redistribution is not testing anything specific for the added/dropped >>>>>>> column. >>>>>>> >>>>>> The internal, let's say default layer, of distribution mechanism uses >>>>>> an internal COPY and it is important to do this check and correctly bypass >>>>>> the columns that are dropped. The SELECT is just here to check that data >>>>>> has been redistributed correctly. >>>>>> >>>>>> >>>>>> >>>>>>> 6. There are no testcases which would change the distribution type >>>>>>> and node list at the same time. Please add those. (I am assuming that these >>>>>>> two operations are possible together). >>>>>>> >>>>>> Yeah sorry, I have been working on that today and added some >>>>>> additional tests that can do that. >>>>>> They are in the bucket, just I didn't send the absolutely latest >>>>>> version. >>>>>> >>>>>> >>>>>>> 7. Negative testcases need to improved. >>>>>>> >>>>>> What are the negative test cases? It would be cool if you could >>>>>> precise. >>>>>> >>>>> >>>>> Tests which do negative testing ( >>>>> http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Additional feature >>>>>>> ================== >>>>>>> It will be helpful to add the distribution information in the output >>>>>>> of \d command for tables. It will be good tool for tests to check whether >>>>>>> the catalogs have been updated correctly or not. Please add this feature >>>>>>> before we complete ALTER TABLE. It shouldn't take much time. Please provide >>>>>>> this as a separate patch. >>>>>>> >>>>>> +1. >>>>>> This is a good idea, and I recall we had this discussion a couple of >>>>>> months ago. However it is not directly related with redistribution. So it >>>>>> should be provided after committing the redistribution work I believe. >>>>>> >>>>> >>>>> It will help in testing the feature. For example, you can just do \d >>>>> on the redistributed table, to see if catalogs have been updated correctly >>>>> or not. So, it's better to do it before this ALTER TABLE, so that you can >>>>> use it in the tests. It should been done when the work related to the >>>>> subcluster was done, even before when XC was started :). Anyway, earlier >>>>> the better. >>>>> >>>>> >>>>>> Also, I think we shouldn't use ¥d as it will impact other >>>>>> applications like pgadmin for instance. We should use an extension of ¥d >>>>>> like for example ¥dZ. This is just a suggestion, I don't know what are the >>>>>> commands still not in use. >>>>>> >>>>> >>>>> \d is for describing a relation at bare minimum. In XC distribution >>>>> strategy becomes an integral part of a relation, and thus should be part of >>>>> the \d output. Applications using \d will need a change, but how many >>>>> applications connect via psql to fire commands (very less, I guess), so we >>>>> are not in much trouble. If one compares changing grammar of say CREATE >>>>> TABLE after the first release, would be more problematic that this one. >>>>> >>>>> >>>>>> -- >>>>>> Michael Paquier >>>>>> http://michael.otacoo.com >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Wishes, >>>>> Ashutosh Bapat >>>>> EntepriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> >>>>> >>>> >>>> >>>> -- >>>> Michael Paquier >>>> http://michael.otacoo.com >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >> >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2012-07-10 10:14:23
|
> 8. Probably not your code but, Function BuildRelationDistributionNodes() >> does a repalloc() for every new nodeoid it finds. Each repalloc is costly. >> Instead we can allocate memory large enough to contain all members of the >> list passed. If there are node repeated (which will be less likely), we >> will waste a few bytes, but won't be as expensive as calling repalloc(). >> > So doing a huge palloc done once scalled with the number of Datanodes? > Yes. The palloc is as large as the number of nodes specified (including duplicates) and not the actual number of datanodes available. > > >> 9. All the renamed functions are marked as "extern", do you really need >> them so? Also, I don't understand why these functions are located in heap.c? >> > It is their historical place. Want to move them away if possible? > > >> I hope regression is sane. > > *Regression IS sane*. I already checked for each patch. You should stop > to worry. > If you have that many comments also for the other 2 patches, which are > only the base.... > It might take a looooonng time. > Please also consider this. > I am fine with whatever time it takes as long as the end result is in good shape. > Thanks. > > > >> >> >> On Tue, Jul 10, 2012 at 8:27 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi Ashutosh, >>> >>> Please find attached the wanted patches: >>> 1) 20120710_grammar.patch, refactoring the grammar >>> 2) 20120710_refactor.patch, refactoring CREATE TABLE code for functions >>> related to distribution >>> 3) 20120710_remotecopy.patch, refactoring the COPY code into remote COPY. >>> In order to simplify my maintenance work and yours, I think you should >>> have a look at those patches before looking at the redistribution work. >>> Those patches are really simple, have no whitespace, no warnings, are >>> independant to each other, and each of them is essential for the >>> redistribution algorithm. >>> As they are really simple, please let's accelerate the review of those 3 >>> ones, commit them and move to the heart of the discussions. >>> >>> Thanks in advance. >>> >>> >>> On Mon, Jul 9, 2012 at 9:22 PM, Ashutosh Bapat < >>> ash...@en...> wrote: >>> >>>> >>>> >>>> On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier < >>>> mic...@gm...> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < >>>>> ash...@en...> wrote: >>>>> >>>>>> Hi Michael, >>>>>> I had a look at the patch. I mainly focused on the overall content of >>>>>> the patch and importantly tests. Before I look at the redistribution code >>>>>> thoroughly, I have few comments. >>>>>> >>>>>> There are many trailing white spaces in the patch. Please fix those, >>>>>> they unnecessarily fail the automatic merges sometimes. You can do that >>>>>> when you commit the patch. >>>>>> >>>>> Oh OK, I didn't notice. Do you have some places particularly in mind? >>>>> >>>> >>>> Apply your patch on clean repository using git apply and it will show >>>> you. >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> Code >>>>>> ==== >>>>>> 1. There is a lot of code, which is refactoring existing code, >>>>>> renaming functions, which is not necessarily related to redistribution >>>>>> work. Can you please provide separate patches for this refactoring? We >>>>>> should commit them separately. For example build_subcluster_data() has been >>>>>> renamed (for good may be), but it makes sense if we do it separately. >>>>>> Someone looking at the ALTER TABLE commit should not get overwhelmed by the >>>>>> extraneous changes. >>>>>> >>>>> OK. The problem with the functions currently on master was that there >>>>> name was not really generic and sometimes did not reflect their real >>>>> functionality. So as now the plan is to use them in a more general way, I >>>>> think there name is not going to change anymore. >>>>> >>>>> >>>>>> >>>>>> 2. Same is the case with the grammar changes. Please separate the >>>>>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>>>>> it's because of ALTER TABLE you need to do those changes. >>>>>> >>>>> OK understood. >>>>> >>>>> >>>>>> >>>>>> Please get these patches reviewed as well, since I haven't looked at >>>>>> the changes proper. >>>>>> >>>>> Understood, I'll make those 2 patches on tomorrow morning, not a big >>>>> deal. >>>>> >>>>> >>>>>> >>>>>> Tests >>>>>> ===== >>>>>> 1. There is no need to test with huge data, that slows down >>>>>> regression. For performance testing, you can create a separate test (not to >>>>>> be included in regression), if you want. >>>>>> >>>>> That may be an idea. However you are right I'll limit the number of >>>>> rows tested. >>>>> >>>>> >>>>> >>>>>> 2. We need tests, which will test the plan cache (in)validation upon >>>>>> redistribution of data, tests for testing existing views working after the >>>>>> redistribution. Please take a look at the PG alter table test for more such >>>>>> scenarios. >>>>> >>>>> OK I'll add those scenarios. They will be included in xc_alter_table. >>>>> >>>>> >>>>>> If you happen to add some performance tests, it would be also good to >>>>>> test the sanity of concurrent transactions accessing the object/s being >>>>>> redistributed. It's vital considering that such redistribution would run >>>>>> for longer. >>>>>> >>>>> Yes, it would be nice to >>>>> >>>>> >>>>> >>>>>> 3. Instead of relying on count(*) to show sanity of the >>>>>> redistributed data, you may use better aggregates like array_agg or sum(), >>>>>> avg() and count(). I would prefer array_agg over others, since you can list >>>>>> all the data values there. You will need aggregate's order by clause (Not >>>>>> that of the SELECT). >>>>>> 4. In the case of redistribution of table with index, you will need >>>>>> to check the sanity of index after the redistribution by some means. >>>>>> >>>>> Do you have an idea of how to do that? Pick up some tests from >>>>> postgres? >>>>> >>>> >>>> Good question. But I don't have an answer (specifically for XC, since >>>> the indexes are on datanodes). >>>> >>>> >>>>> >>>>> >>>>>> 5. I did not understand the significance of the tests where you add >>>>>> and drop column and redistribute the data. The SELECT after the >>>>>> redistribution is not testing anything specific for the added/dropped >>>>>> column. >>>>>> >>>>> The internal, let's say default layer, of distribution mechanism uses >>>>> an internal COPY and it is important to do this check and correctly bypass >>>>> the columns that are dropped. The SELECT is just here to check that data >>>>> has been redistributed correctly. >>>>> >>>>> >>>>> >>>>>> 6. There are no testcases which would change the distribution type >>>>>> and node list at the same time. Please add those. (I am assuming that these >>>>>> two operations are possible together). >>>>>> >>>>> Yeah sorry, I have been working on that today and added some >>>>> additional tests that can do that. >>>>> They are in the bucket, just I didn't send the absolutely latest >>>>> version. >>>>> >>>>> >>>>>> 7. Negative testcases need to improved. >>>>>> >>>>> What are the negative test cases? It would be cool if you could >>>>> precise. >>>>> >>>> >>>> Tests which do negative testing ( >>>> http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> Additional feature >>>>>> ================== >>>>>> It will be helpful to add the distribution information in the output >>>>>> of \d command for tables. It will be good tool for tests to check whether >>>>>> the catalogs have been updated correctly or not. Please add this feature >>>>>> before we complete ALTER TABLE. It shouldn't take much time. Please provide >>>>>> this as a separate patch. >>>>>> >>>>> +1. >>>>> This is a good idea, and I recall we had this discussion a couple of >>>>> months ago. However it is not directly related with redistribution. So it >>>>> should be provided after committing the redistribution work I believe. >>>>> >>>> >>>> It will help in testing the feature. For example, you can just do \d on >>>> the redistributed table, to see if catalogs have been updated correctly or >>>> not. So, it's better to do it before this ALTER TABLE, so that you can use >>>> it in the tests. It should been done when the work related to the >>>> subcluster was done, even before when XC was started :). Anyway, earlier >>>> the better. >>>> >>>> >>>>> Also, I think we shouldn't use ¥d as it will impact other applications >>>>> like pgadmin for instance. We should use an extension of ¥d like for >>>>> example ¥dZ. This is just a suggestion, I don't know what are the commands >>>>> still not in use. >>>>> >>>> >>>> \d is for describing a relation at bare minimum. In XC distribution >>>> strategy becomes an integral part of a relation, and thus should be part of >>>> the \d output. Applications using \d will need a change, but how many >>>> applications connect via psql to fire commands (very less, I guess), so we >>>> are not in much trouble. If one compares changing grammar of say CREATE >>>> TABLE after the first release, would be more problematic that this one. >>>> >>>> >>>>> -- >>>>> Michael Paquier >>>>> http://michael.otacoo.com >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>> >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> > > > -- > Michael Paquier > http://michael.otacoo.com > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-10 10:09:53
|
On Tue, Jul 10, 2012 at 6:41 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Michael, > Comments on 20120710_refactor patch. > > 1. Please name the variables as local_hashalgorithm instead of > hashalgorithm_loc (loc is also used mean the location info e.g Get > RelationLocInfo()). > OK. > 2. Please rename IsHashDistributable as IsTypeHashDistributable(). Same > case with IsModuloDistributable(). > OK. > 3. In function prologues, add information about input and output > variables, esp. in case of GetRelationDistributionItems(). > OK. > 4. This is not change in this patch, but good if you can accommodate it. > At line 1102, there is a switch case, which has action only for a single > case, so, it better be replaced with an "if". > Hum. I'll look at it. There may be a reason why I used a switch there. > 5. SortRelationDistributionNodes, better be a macro, as it's not doing > anything but call qsort(). > OK. > 6. Following comment doesn't make much sense, please remove it. The > executor state at the time of table creation and querying can be completely > different. There is no connection > 1218 * We should use session data because Executor uses it as > well to run > 1219 * commands on nodes. > OK. > 7. In GetRelationDistributionNodes(), there are three places, node sorting > function is called. Instead, you should just nodeoid array at these three > places and call the sorting function at the end. In case we need to add > another if case in that function, to get array of nodes in some other way, > one has to remember to add the call to sort the nodes array, which can be > avoided if you add the call to sort function at the end. > BuildRelationDistributionNodes() sorts the nodeoids inside it, but you can > take that call out of this function. > I'll see about that. > 8. Probably not your code but, Function BuildRelationDistributionNodes() > does a repalloc() for every new nodeoid it finds. Each repalloc is costly. > Instead we can allocate memory large enough to contain all members of the > list passed. If there are node repeated (which will be less likely), we > will waste a few bytes, but won't be as expensive as calling repalloc(). > So doing a huge palloc done once scalled with the number of Datanodes? > 9. All the renamed functions are marked as "extern", do you really need > them so? Also, I don't understand why these functions are located in heap.c? > It is their historical place. Want to move them away if possible? > I hope regression is sane. *Regression IS sane*. I already checked for each patch. You should stop to worry. If you have that many comments also for the other 2 patches, which are only the base.... It might take a looooonng time. Please also consider this. Thanks. > > > On Tue, Jul 10, 2012 at 8:27 AM, Michael Paquier < > mic...@gm...> wrote: > >> Hi Ashutosh, >> >> Please find attached the wanted patches: >> 1) 20120710_grammar.patch, refactoring the grammar >> 2) 20120710_refactor.patch, refactoring CREATE TABLE code for functions >> related to distribution >> 3) 20120710_remotecopy.patch, refactoring the COPY code into remote COPY. >> In order to simplify my maintenance work and yours, I think you should >> have a look at those patches before looking at the redistribution work. >> Those patches are really simple, have no whitespace, no warnings, are >> independant to each other, and each of them is essential for the >> redistribution algorithm. >> As they are really simple, please let's accelerate the review of those 3 >> ones, commit them and move to the heart of the discussions. >> >> Thanks in advance. >> >> >> On Mon, Jul 9, 2012 at 9:22 PM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> >>> >>> On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> >>>> >>>> On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < >>>> ash...@en...> wrote: >>>> >>>>> Hi Michael, >>>>> I had a look at the patch. I mainly focused on the overall content of >>>>> the patch and importantly tests. Before I look at the redistribution code >>>>> thoroughly, I have few comments. >>>>> >>>>> There are many trailing white spaces in the patch. Please fix those, >>>>> they unnecessarily fail the automatic merges sometimes. You can do that >>>>> when you commit the patch. >>>>> >>>> Oh OK, I didn't notice. Do you have some places particularly in mind? >>>> >>> >>> Apply your patch on clean repository using git apply and it will show >>> you. >>> >>> >>>> >>>> >>>>> >>>>> Code >>>>> ==== >>>>> 1. There is a lot of code, which is refactoring existing code, >>>>> renaming functions, which is not necessarily related to redistribution >>>>> work. Can you please provide separate patches for this refactoring? We >>>>> should commit them separately. For example build_subcluster_data() has been >>>>> renamed (for good may be), but it makes sense if we do it separately. >>>>> Someone looking at the ALTER TABLE commit should not get overwhelmed by the >>>>> extraneous changes. >>>>> >>>> OK. The problem with the functions currently on master was that there >>>> name was not really generic and sometimes did not reflect their real >>>> functionality. So as now the plan is to use them in a more general way, I >>>> think there name is not going to change anymore. >>>> >>>> >>>>> >>>>> 2. Same is the case with the grammar changes. Please separate the >>>>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>>>> it's because of ALTER TABLE you need to do those changes. >>>>> >>>> OK understood. >>>> >>>> >>>>> >>>>> Please get these patches reviewed as well, since I haven't looked at >>>>> the changes proper. >>>>> >>>> Understood, I'll make those 2 patches on tomorrow morning, not a big >>>> deal. >>>> >>>> >>>>> >>>>> Tests >>>>> ===== >>>>> 1. There is no need to test with huge data, that slows down >>>>> regression. For performance testing, you can create a separate test (not to >>>>> be included in regression), if you want. >>>>> >>>> That may be an idea. However you are right I'll limit the number of >>>> rows tested. >>>> >>>> >>>> >>>>> 2. We need tests, which will test the plan cache (in)validation upon >>>>> redistribution of data, tests for testing existing views working after the >>>>> redistribution. Please take a look at the PG alter table test for more such >>>>> scenarios. >>>> >>>> OK I'll add those scenarios. They will be included in xc_alter_table. >>>> >>>> >>>>> If you happen to add some performance tests, it would be also good to >>>>> test the sanity of concurrent transactions accessing the object/s being >>>>> redistributed. It's vital considering that such redistribution would run >>>>> for longer. >>>>> >>>> Yes, it would be nice to >>>> >>>> >>>> >>>>> 3. Instead of relying on count(*) to show sanity of the redistributed >>>>> data, you may use better aggregates like array_agg or sum(), avg() and >>>>> count(). I would prefer array_agg over others, since you can list all the >>>>> data values there. You will need aggregate's order by clause (Not that of >>>>> the SELECT). >>>>> 4. In the case of redistribution of table with index, you will need to >>>>> check the sanity of index after the redistribution by some means. >>>>> >>>> Do you have an idea of how to do that? Pick up some tests from postgres? >>>> >>> >>> Good question. But I don't have an answer (specifically for XC, since >>> the indexes are on datanodes). >>> >>> >>>> >>>> >>>>> 5. I did not understand the significance of the tests where you add >>>>> and drop column and redistribute the data. The SELECT after the >>>>> redistribution is not testing anything specific for the added/dropped >>>>> column. >>>>> >>>> The internal, let's say default layer, of distribution mechanism uses >>>> an internal COPY and it is important to do this check and correctly bypass >>>> the columns that are dropped. The SELECT is just here to check that data >>>> has been redistributed correctly. >>>> >>>> >>>> >>>>> 6. There are no testcases which would change the distribution type and >>>>> node list at the same time. Please add those. (I am assuming that these two >>>>> operations are possible together). >>>>> >>>> Yeah sorry, I have been working on that today and added some additional >>>> tests that can do that. >>>> They are in the bucket, just I didn't send the absolutely latest >>>> version. >>>> >>>> >>>>> 7. Negative testcases need to improved. >>>>> >>>> What are the negative test cases? It would be cool if you could precise. >>>> >>> >>> Tests which do negative testing ( >>> http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) >>> >>> >>>> >>>> >>>>> >>>>> Additional feature >>>>> ================== >>>>> It will be helpful to add the distribution information in the output >>>>> of \d command for tables. It will be good tool for tests to check whether >>>>> the catalogs have been updated correctly or not. Please add this feature >>>>> before we complete ALTER TABLE. It shouldn't take much time. Please provide >>>>> this as a separate patch. >>>>> >>>> +1. >>>> This is a good idea, and I recall we had this discussion a couple of >>>> months ago. However it is not directly related with redistribution. So it >>>> should be provided after committing the redistribution work I believe. >>>> >>> >>> It will help in testing the feature. For example, you can just do \d on >>> the redistributed table, to see if catalogs have been updated correctly or >>> not. So, it's better to do it before this ALTER TABLE, so that you can use >>> it in the tests. It should been done when the work related to the >>> subcluster was done, even before when XC was started :). Anyway, earlier >>> the better. >>> >>> >>>> Also, I think we shouldn't use ¥d as it will impact other applications >>>> like pgadmin for instance. We should use an extension of ¥d like for >>>> example ¥dZ. This is just a suggestion, I don't know what are the commands >>>> still not in use. >>>> >>> >>> \d is for describing a relation at bare minimum. In XC distribution >>> strategy becomes an integral part of a relation, and thus should be part of >>> the \d output. Applications using \d will need a change, but how many >>> applications connect via psql to fire commands (very less, I guess), so we >>> are not in much trouble. If one compares changing grammar of say CREATE >>> TABLE after the first release, would be more problematic that this one. >>> >>> >>>> -- >>>> Michael Paquier >>>> http://michael.otacoo.com >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >> >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2012-07-10 09:41:58
|
Hi Michael, Comments on 20120710_refactor patch. 1. Please name the variables as local_hashalgorithm instead of hashalgorithm_loc (loc is also used mean the location info e.g Get RelationLocInfo()). 2. Please rename IsHashDistributable as IsTypeHashDistributable(). Same case with IsModuloDistributable(). 3. In function prologues, add information about input and output variables, esp. in case of GetRelationDistributionItems(). 4. This is not change in this patch, but good if you can accommodate it. At line 1102, there is a switch case, which has action only for a single case, so, it better be replaced with an "if". 5. SortRelationDistributionNodes, better be a macro, as it's not doing anything but call qsort(). 6. Following comment doesn't make much sense, please remove it. The executor state at the time of table creation and querying can be completely different. There is no connection 1218 * We should use session data because Executor uses it as well to run 1219 * commands on nodes. 7. In GetRelationDistributionNodes(), there are three places, node sorting function is called. Instead, you should just nodeoid array at these three places and call the sorting function at the end. In case we need to add another if case in that function, to get array of nodes in some other way, one has to remember to add the call to sort the nodes array, which can be avoided if you add the call to sort function at the end. BuildRelationDistributionNodes() sorts the nodeoids inside it, but you can take that call out of this function. 8. Probably not your code but, Function BuildRelationDistributionNodes() does a repalloc() for every new nodeoid it finds. Each repalloc is costly. Instead we can allocate memory large enough to contain all members of the list passed. If there are node repeated (which will be less likely), we will waste a few bytes, but won't be as expensive as calling repalloc(). 9. All the renamed functions are marked as "extern", do you really need them so? Also, I don't understand why these functions are located in heap.c? I hope regression is sane. On Tue, Jul 10, 2012 at 8:27 AM, Michael Paquier <mic...@gm...>wrote: > Hi Ashutosh, > > Please find attached the wanted patches: > 1) 20120710_grammar.patch, refactoring the grammar > 2) 20120710_refactor.patch, refactoring CREATE TABLE code for functions > related to distribution > 3) 20120710_remotecopy.patch, refactoring the COPY code into remote COPY. > In order to simplify my maintenance work and yours, I think you should > have a look at those patches before looking at the redistribution work. > Those patches are really simple, have no whitespace, no warnings, are > independant to each other, and each of them is essential for the > redistribution algorithm. > As they are really simple, please let's accelerate the review of those 3 > ones, commit them and move to the heart of the discussions. > > Thanks in advance. > > > On Mon, Jul 9, 2012 at 9:22 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> >> >> On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier < >> mic...@gm...> wrote: >> >>> >>> >>> On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < >>> ash...@en...> wrote: >>> >>>> Hi Michael, >>>> I had a look at the patch. I mainly focused on the overall content of >>>> the patch and importantly tests. Before I look at the redistribution code >>>> thoroughly, I have few comments. >>>> >>>> There are many trailing white spaces in the patch. Please fix those, >>>> they unnecessarily fail the automatic merges sometimes. You can do that >>>> when you commit the patch. >>>> >>> Oh OK, I didn't notice. Do you have some places particularly in mind? >>> >> >> Apply your patch on clean repository using git apply and it will show you. >> >> >>> >>> >>>> >>>> Code >>>> ==== >>>> 1. There is a lot of code, which is refactoring existing code, renaming >>>> functions, which is not necessarily related to redistribution work. Can you >>>> please provide separate patches for this refactoring? We should commit them >>>> separately. For example build_subcluster_data() has been renamed (for good >>>> may be), but it makes sense if we do it separately. Someone looking at the >>>> ALTER TABLE commit should not get overwhelmed by the extraneous changes. >>>> >>> OK. The problem with the functions currently on master was that there >>> name was not really generic and sometimes did not reflect their real >>> functionality. So as now the plan is to use them in a more general way, I >>> think there name is not going to change anymore. >>> >>> >>>> >>>> 2. Same is the case with the grammar changes. Please separate the >>>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>>> it's because of ALTER TABLE you need to do those changes. >>>> >>> OK understood. >>> >>> >>>> >>>> Please get these patches reviewed as well, since I haven't looked at >>>> the changes proper. >>>> >>> Understood, I'll make those 2 patches on tomorrow morning, not a big >>> deal. >>> >>> >>>> >>>> Tests >>>> ===== >>>> 1. There is no need to test with huge data, that slows down regression. >>>> For performance testing, you can create a separate test (not to be included >>>> in regression), if you want. >>>> >>> That may be an idea. However you are right I'll limit the number of >>> rows tested. >>> >>> >>> >>>> 2. We need tests, which will test the plan cache (in)validation upon >>>> redistribution of data, tests for testing existing views working after the >>>> redistribution. Please take a look at the PG alter table test for more such >>>> scenarios. >>> >>> OK I'll add those scenarios. They will be included in xc_alter_table. >>> >>> >>>> If you happen to add some performance tests, it would be also good to >>>> test the sanity of concurrent transactions accessing the object/s being >>>> redistributed. It's vital considering that such redistribution would run >>>> for longer. >>>> >>> Yes, it would be nice to >>> >>> >>> >>>> 3. Instead of relying on count(*) to show sanity of the redistributed >>>> data, you may use better aggregates like array_agg or sum(), avg() and >>>> count(). I would prefer array_agg over others, since you can list all the >>>> data values there. You will need aggregate's order by clause (Not that of >>>> the SELECT). >>>> 4. In the case of redistribution of table with index, you will need to >>>> check the sanity of index after the redistribution by some means. >>>> >>> Do you have an idea of how to do that? Pick up some tests from postgres? >>> >> >> Good question. But I don't have an answer (specifically for XC, since the >> indexes are on datanodes). >> >> >>> >>> >>>> 5. I did not understand the significance of the tests where you add and >>>> drop column and redistribute the data. The SELECT after the redistribution >>>> is not testing anything specific for the added/dropped column. >>>> >>> The internal, let's say default layer, of distribution mechanism uses an >>> internal COPY and it is important to do this check and correctly bypass the >>> columns that are dropped. The SELECT is just here to check that data has >>> been redistributed correctly. >>> >>> >>> >>>> 6. There are no testcases which would change the distribution type and >>>> node list at the same time. Please add those. (I am assuming that these two >>>> operations are possible together). >>>> >>> Yeah sorry, I have been working on that today and added some additional >>> tests that can do that. >>> They are in the bucket, just I didn't send the absolutely latest version. >>> >>> >>>> 7. Negative testcases need to improved. >>>> >>> What are the negative test cases? It would be cool if you could precise. >>> >> >> Tests which do negative testing ( >> http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) >> >> >>> >>> >>>> >>>> Additional feature >>>> ================== >>>> It will be helpful to add the distribution information in the output of >>>> \d command for tables. It will be good tool for tests to check whether the >>>> catalogs have been updated correctly or not. Please add this feature before >>>> we complete ALTER TABLE. It shouldn't take much time. Please provide this >>>> as a separate patch. >>>> >>> +1. >>> This is a good idea, and I recall we had this discussion a couple of >>> months ago. However it is not directly related with redistribution. So it >>> should be provided after committing the redistribution work I believe. >>> >> >> It will help in testing the feature. For example, you can just do \d on >> the redistributed table, to see if catalogs have been updated correctly or >> not. So, it's better to do it before this ALTER TABLE, so that you can use >> it in the tests. It should been done when the work related to the >> subcluster was done, even before when XC was started :). Anyway, earlier >> the better. >> >> >>> Also, I think we shouldn't use ¥d as it will impact other applications >>> like pgadmin for instance. We should use an extension of ¥d like for >>> example ¥dZ. This is just a suggestion, I don't know what are the commands >>> still not in use. >>> >> >> \d is for describing a relation at bare minimum. In XC distribution >> strategy becomes an integral part of a relation, and thus should be part of >> the \d output. Applications using \d will need a change, but how many >> applications connect via psql to fire commands (very less, I guess), so we >> are not in much trouble. If one compares changing grammar of say CREATE >> TABLE after the first release, would be more problematic that this one. >> >> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> > > > -- > Michael Paquier > http://michael.otacoo.com > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Koichi S. <koi...@gm...> - 2012-07-10 04:51:43
|
Yes. Although we don't have to care application partitioning based upon the distribution key, it's a good idea to make all the coordinator workload as even as possible. In the case of DBT-1, we ran several DBT-1 process, each produces random transaction but goes to specific coordinator. I think pgbench can do the similar. Regards; ---------- Koichi Suzuki 2012/7/10 Ashutosh Bapat <ash...@en...>: > Hi Shankar, > Will it be possible for you to change the pgbench code to dynamically fire > on all available coordinators? > > Since we use modified DBT-1 for our benchmarking, we haven't got to the > point where we can modify pg_bench to suite XC. But that's something, we > will welcome if anybody is interested. > > > On Mon, Jul 9, 2012 at 9:41 PM, Shankar Hariharan > <har...@ya...> wrote: >> >> Thanks Ashutosh. You are right, while running this test i just had pgbench >> running against one coordinator. Looks like pgbench by itself may not be an >> apt tool for this kind of testing, I will instead run pgbench's underlying >> sql script from cmdline against either coordinators. Thanks for that tip. >> >> I got a lot of input on my problem from a lot of folks on the list, the >> feedback is much appreciated. Thanks everybody! >> >> On max_prepared_transactions, I will factor in the number of coordinators >> and the max_connections on each coordinator while arriving at a figure. >> Will also try out Koichi Suzuki's suggestion to have multiple NICs on the >> GTM. I will post my findings here for the same cluster configuration as >> before. >> >> thanks, >> Shankar >> >> ________________________________ >> From: Ashutosh Bapat <ash...@en...> >> To: Shankar Hariharan <har...@ya...> >> Cc: "pos...@li..." >> <pos...@li...> >> Sent: Sunday, July 8, 2012 11:02 PM >> >> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >> Hi Shankar, >> You have got answers to the prepared transaction problem, I guess. I have >> something else below. >> >> On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan >> <har...@ya...> wrote: >> >> As planned I ran some tests using PGBench on this setup : >> >> Node 1 - Coord1, Datanode1, gtm-proxy1 >> Node 2- Coord2, Datanode2, gtm-proxy2 >> Node 3- Datanode3, gtm >> >> I was connecting via Coord1 for these tests: >> - scale factor of 30 used >> - tests run using the following input parameters for pgbench: >> >> >> Try connecting to both the coordinators, it should give you better >> performance, esp, when you are using distributed tables. With distributed >> tables, coordinator gets involved in query execution more than that in the >> case of replicated tables. So, balancing load across two coordinators would >> help. >> >> >> >> Clients Threads Duration Transactions >> 1 1 100 6204 >> 2 2 100 9960 >> 4 4 100 12880 >> 6 6 100 1676 >> >> >> >> 8 >> 8 8 100 19758 >> 10 10 100 21944 >> 12 12 100 20674 >> >> The run went well until the 8 clients. I started seeing errors on 10 >> clients onwards and eventually the 14 client run has been hanging around for >> over an hour now. The errors I have been seeing on console are the following >> : >> >> pgbench console : >> Client 8 aborted in state 12: ERROR: GTM error, could not obtain snapshot >> Client 0 aborted in state 13: ERROR: maximum number of prepared >> transactions reached >> Client 7 aborted in state 13: ERROR: maximum number of prepared >> transactions reached >> Client 11 aborted in state 13: ERROR: maximum number of prepared >> transactions reached >> Client 9 aborted in state 13: ERROR: maximum number of prepared >> transactions reached >> >> node console: >> ERROR: GTM error, could not obtain snapshot >> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >> VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); >> ERROR: maximum number of prepared transactions reached >> HINT: Increase max_prepared_transactions (currently 10). >> STATEMENT: PREPARE TRANSACTION 'T201428' >> ERROR: maximum number of prepared transactions reached >> STATEMENT: END; >> ERROR: maximum number of prepared transactions reached >> STATEMENT: END; >> ERROR: maximum number of prepared transactions reached >> STATEMENT: END; >> ERROR: maximum number of prepared transactions reached >> STATEMENT: END; >> ERROR: GTM error, could not obtain snapshot >> STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) >> VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); >> >> I was also watching the processes on each node and see the following for >> the 14 client run: >> >> >> Node1 : >> postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres >> postgres ::1(33481) TRUNCATE TABLE waiting >> postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres >> postgres pgbench-address (50388) TRUNCATE TABLE >> >> Node2: >> postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres >> postgres coord1-address(57357) idle in transaction >> >> Node3: >> postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres >> postgres coord1-address(51406) TRUNCATE TABLE waiting >> >> >> I was going to restart the processes on all nodes and start over but did >> not want to lose this data as it could be useful information. >> >> Any explanation on the above issue is much appreciated. I will try the >> next run with a higher value set for max_prepared_transactions. Any >> recommendations for a good value on this front? >> >> thanks, >> Shankar >> >> >> ________________________________ >> From: Shankar Hariharan <har...@ya...> >> To: Ashutosh Bapat <ash...@en...> >> Cc: "pos...@li..." >> <pos...@li...> >> Sent: Friday, July 6, 2012 8:22 AM >> >> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >> Hi Ashutosh, >> I was trying to size the load on a server and was wondering if a GTM >> could be shared w/o much performance overhead between a small number of >> datanodes and coordinators. I will post my findings here. >> thanks, >> Shankar >> >> ________________________________ >> From: Ashutosh Bapat <ash...@en...> >> To: Shankar Hariharan <har...@ya...> >> Cc: "pos...@li..." >> <pos...@li...> >> Sent: Friday, July 6, 2012 12:25 AM >> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >> Hi Shankar, >> Running gtm-proxy has shown to improve the performance, because it lessens >> the load on GTM, by serving requests locally. Why do you want the >> coordinators to connect directly to the GTM? Are you seeing any performance >> improvement from doing that? >> >> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >> <har...@ya...> wrote: >> >> Follow up to earlier email. In the setup described below, can I avoid >> using a gtm-proxy? That is, can I just simply point coordinators to the one >> gtm running on node 3 ? >> My initial plan was to just run the gtm on node 3 then I thought I could >> try a datanode without a local coordinator which was why I put these two >> together on node 3. >> thanks, >> Shankar >> >> ________________________________ >> From: Shankar Hariharan <har...@ya...> >> To: "pos...@li..." >> <pos...@li...> >> Sent: Thursday, July 5, 2012 11:35 PM >> Subject: Question on multiple coordinators >> >> Hello, >> >> Am trying out XC 1.0 in the following configuraiton. >> Node 1 - Coord1, Datanode1, gtm-proxy1 >> Node 2- Coord2, Datanode2, gtm-proxy2 >> Node 3- Datanode3, gtm >> >> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >> addition I missed the pg_hba edit as well. So the first table T1 that I >> created for distribution from Coord1 was not "visible| from Coord2 but was >> on all the data nodes. >> I tried to get Coord2 backinto business in various ways but the first >> table I created refused to show up on Coord2 : >> - edit pg_hba and add node on both coord1 and 2. Then run select >> pgxc_pool_reload(); >> - restart coord 1 and 2 >> - drop node c2 from c1 and c1 from c2 and add them back followed by select >> pgxc_pool_reload(); >> >> So I tried to create the same table T1 from Coord2 to observe behavior and >> it did not like it clearly as all nodes it "wrote" to reported that the >> table already existed which was good. At this point I could understand that >> Coord2 and Coord1 are not talking alright so I created a new table from >> coord1 with replication. This table was visible from both now. >> >> Question is should I expect to see the first table, let me call it T1 >> after a while from Coord2 also? >> >> >> thanks, >> Shankar >> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> >> >> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Ashutosh B. <ash...@en...> - 2012-07-10 04:25:26
|
Hi Shankar, Will it be possible for you to change the pgbench code to dynamically fire on all available coordinators? Since we use modified DBT-1 for our benchmarking, we haven't got to the point where we can modify pg_bench to suite XC. But that's something, we will welcome if anybody is interested. On Mon, Jul 9, 2012 at 9:41 PM, Shankar Hariharan < har...@ya...> wrote: > Thanks Ashutosh. You are right, while running this test i just had pgbench > running against one coordinator. Looks like pgbench by itself may not be an > apt tool for this kind of testing, I will instead run pgbench's underlying > sql script from cmdline against either coordinators. Thanks for that tip. > > I got a lot of input on my problem from a lot of folks on the list, the > feedback is much appreciated. Thanks everybody! > > On max_prepared_transactions, I will factor in the number of coordinators > and the max_connections on each coordinator while arriving at a figure. > Will also try out Koichi Suzuki's suggestion to have multiple NICs on the > GTM. I will post my findings here for the same cluster configuration as > before. > > thanks, > Shankar > > ------------------------------ > *From:* Ashutosh Bapat <ash...@en...> > *To:* Shankar Hariharan <har...@ya...> > *Cc:* "pos...@li..." < > pos...@li...> > *Sent:* Sunday, July 8, 2012 11:02 PM > > *Subject:* Re: [Postgres-xc-developers] Question on gtm-proxy > > Hi Shankar, > You have got answers to the prepared transaction problem, I guess. I have > something else below. > > On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan < > har...@ya...> wrote: > > As planned I ran some tests using PGBench on this setup : > > Node 1 - Coord1, Datanode1, gtm-proxy1 > Node 2- Coord2, Datanode2, gtm-proxy2 > Node 3- Datanode3, gtm > > I was connecting via Coord1 for these tests: > - scale factor of 30 used > - tests run using the following input parameters for pgbench: > > > Try connecting to both the coordinators, it should give you better > performance, esp, when you are using distributed tables. With distributed > tables, coordinator gets involved in query execution more than that in the > case of replicated tables. So, balancing load across two coordinators would > help. > > > > Clients Threads Duration Transactions > 1 1 100 6204 > 2 2 100 9960 > 4 4 100 12880 > 6 6 100 1676 > > > > 8 > 8 8 100 19758 > 10 10 100 21944 > 12 12 100 20674 > > The run went well until the 8 clients. I started seeing errors on 10 > clients onwards and eventually the 14 client run has been hanging around > for over an hour now. The errors I have been seeing on console are the > following : > > pgbench console : > Client 8 aborted in state 12: ERROR: GTM error, could not obtain snapshot > Client 0 aborted in state 13: ERROR: maximum number of prepared > transactions reached > Client 7 aborted in state 13: ERROR: maximum number of prepared > transactions reached > Client 11 aborted in state 13: ERROR: maximum number of prepared > transactions reached > Client 9 aborted in state 13: ERROR: maximum number of prepared > transactions reached > > node console: > ERROR: GTM error, could not obtain snapshot > STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) > VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); > ERROR: maximum number of prepared transactions reached > HINT: Increase max_prepared_transactions (currently 10). > STATEMENT: PREPARE TRANSACTION 'T201428' > ERROR: maximum number of prepared transactions reached > STATEMENT: END; > ERROR: maximum number of prepared transactions reached > STATEMENT: END; > ERROR: maximum number of prepared transactions reached > STATEMENT: END; > ERROR: maximum number of prepared transactions reached > STATEMENT: END; > ERROR: GTM error, could not obtain snapshot > STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) > VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); > > I was also watching the processes on each node and see the following for > the 14 client run: > > > Node1 : > postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres > postgres ::1(33481) TRUNCATE TABLE waiting > postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres > postgres pgbench-address (50388) TRUNCATE TABLE > > Node2: > postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres > postgres coord1-address(57357) idle in transaction > > Node3: > postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres > postgres coord1-address(51406) TRUNCATE TABLE waiting > > > I was going to restart the processes on all nodes and start over but did > not want to lose this data as it could be useful information. > > Any explanation on the above issue is much appreciated. I will try the > next run with a higher value set for max_prepared_transactions. Any > recommendations for a good value on this front? > > thanks, > Shankar > > > ------------------------------ > *From:* Shankar Hariharan <har...@ya...> > *To:* Ashutosh Bapat <ash...@en...> > *Cc:* "pos...@li..." < > pos...@li...> > *Sent:* Friday, July 6, 2012 8:22 AM > > *Subject:* Re: [Postgres-xc-developers] Question on gtm-proxy > > Hi Ashutosh, > I was trying to size the load on a server and was wondering if a GTM > could be shared w/o much performance overhead between a small number of > datanodes and coordinators. I will post my findings here. > thanks, > Shankar > > ------------------------------ > *From:* Ashutosh Bapat <ash...@en...> > *To:* Shankar Hariharan <har...@ya...> > *Cc:* "pos...@li..." < > pos...@li...> > *Sent:* Friday, July 6, 2012 12:25 AM > *Subject:* Re: [Postgres-xc-developers] Question on gtm-proxy > > Hi Shankar, > Running gtm-proxy has shown to improve the performance, because it lessens > the load on GTM, by serving requests locally. Why do you want the > coordinators to connect directly to the GTM? Are you seeing any performance > improvement from doing that? > > On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan < > har...@ya...> wrote: > > Follow up to earlier email. In the setup described below, can I avoid > using a gtm-proxy? That is, can I just simply point coordinators to the one > gtm running on node 3 ? > My initial plan was to just run the gtm on node 3 then I thought I could > try a datanode without a local coordinator which was why I put these two > together on node 3. > thanks, > Shankar > > ------------------------------ > *From:* Shankar Hariharan <har...@ya...> > *To:* "pos...@li..." < > pos...@li...> > *Sent:* Thursday, July 5, 2012 11:35 PM > *Subject:* Question on multiple coordinators > > Hello, > > Am trying out XC 1.0 in the following configuraiton. > Node 1 - Coord1, Datanode1, gtm-proxy1 > Node 2- Coord2, Datanode2, gtm-proxy2 > Node 3- Datanode3, gtm > > I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In > addition I missed the pg_hba edit as well. So the first table T1 that I > created for distribution from Coord1 was not "visible| from Coord2 but > was on all the data nodes. > I tried to get Coord2 backinto business in various ways but the first > table I created refused to show up on Coord2 : > - edit pg_hba and add node on both coord1 and 2. Then run select > pgxc_pool_reload(); > - restart coord 1 and 2 > - drop node c2 from c1 and c1 from c2 and add them back followed by select > pgxc_pool_reload(); > > So I tried to create the same table T1 from Coord2 to observe behavior > and it did not like it clearly as all nodes it "wrote" to reported that the > table already existed which was good. At this point I could understand that > Coord2 and Coord1 are not talking alright so I created a new table from > coord1 with replication. This table was visible from both now. > > Question is should I expect to see the first table, let me call it T1 > after a while from Coord2 also? > > > thanks, > Shankar > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Koichi S. <koi...@gm...> - 2012-07-10 01:54:16
|
I basically agree on the idea. I also think Michael needs this kind of extension to feedback trigger target rows. We should design NoticeResponse extensible enough for various usage. Regards; ---------- Koichi Suzuki 2012/7/10 Abbas Butt <abb...@en...>: > Here is the update on the issue. > > It was decided that the changes done by the data nodes in the command id > should be communicated back to the coordinator and that the coordinator > should choose the largest of all the received values as the next command id. > > It was suggested that we should check that a skipped value of command id > should not create a problem for the next operations on the table. I have > verified both by studying the code and by actually changing the function > CommandCounterIncrement to increment the command id by 3 and running > regression. It worked fine so a hole in command id is not a problem. > > Next it was suggested that we should use the mechanism currently in place to > send # of tuples affected by a statement to communicate the changed command > id to the coordinator. > Please refer to this link in the documentation > http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html > Note that there is no such message format that exists in the current over > the wire protocol to communicate the # of tuples affected by a statement. > The libpq functions that we might suspect do the same are PQntuples and > PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as > PQcmdTuples extracts the # of tuples affected from the CommandComplete 'C' > message string. We cannot use these mechanisms for our purpose. > > I evaluated the use of NoticeResponse 'N' for sending changed command id but > the message format of NoticeResponse mandates the use of certain fields > which will make our sent messages un-necessarily bulky and would consume > network bandwidth for no reason. > > I therefore suggest the we use a new message for communicating XC specific > information from data node to coordinator. Currently we will use it for > command id but we will design the message format flexible enough to > accommodate future XC requirements. Whenever the data node increments > command id we will send the information to the coordinator and > handle_response function in execRemote.c would be changed to accommodate new > message. Since coordinators will never use the new message therefore the > existing clients do not need to bother. > > Comments or suggestions are welcome. > > Regards > > > On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en...> > wrote: >> >> While fixing the regression failures resulting from the changes done by >> the patch I was able to fix all except this test case >> >> set enforce_two_phase_commit = off; >> >> CREATE TEMP TABLE users ( >> id INT PRIMARY KEY, >> name VARCHAR NOT NULL >> ) DISTRIBUTE BY REPLICATION; >> >> INSERT INTO users VALUES (1, 'Jozko'); >> INSERT INTO users VALUES (2, 'Ferko'); >> INSERT INTO users VALUES (3, 'Samko'); >> CREATE TEMP TABLE tasks ( >> id INT PRIMARY KEY, >> owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, >> checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL >> ) DISTRIBUTE BY REPLICATION; >> >> INSERT INTO tasks VALUES (1,1,NULL,NULL); >> INSERT INTO tasks VALUES (2,2,2,NULL); >> INSERT INTO tasks VALUES (3,3,3,3); >> >> BEGIN; >> >> UPDATE tasks set id=id WHERE id=2; >> SELECT * FROM tasks; >> >> DELETE FROM users WHERE id = 2; >> SELECT * FROM tasks; >> >> COMMIT; >> >> The obtained output from the last select statement is >> >> id | owner | worker | checked_by >> ----+-------+--------+------------ >> 1 | 1 | | >> 3 | 3 | 3 | 3 >> 2 | 2 | 2 | >> (3 rows) >> >> where as the expected output is >> >> id | owner | worker | checked_by >> ----+-------+--------+------------ >> 1 | 1 | | >> 3 | 3 | 3 | 3 >> 2 | | | >> (3 rows) >> >> Note that the owner and worker have been set to null due to "ON DELETE SET >> NULL". >> >> Here is the reason why this does not work properly. Consider the last >> transaction >> >> BEGIN; >> >> UPDATE tasks set id=id WHERE id=2; >> SELECT * FROM tasks; >> >> DELETE FROM users WHERE id = 2; >> SELECT * FROM tasks; >> >> COMMIT; >> >> Here are the command id values the coordinator sends to the data node >> >> 0 for the first update that gets incremented to 1 because this is a DML >> and needs to consume a command id >> 1 for the first select that remains 1 since it is not required to be >> consumed. >> 1 for the delete statement that gets incremented to 2 because it is a DML >> and 2 for the last select. >> >> Now this is what happens on the data node >> >> When the data node receives the first update with command id 0, it >> increments it once due to the update itself and once due to the update run >> because of "ON UPDATE CASCADE". Hence the command id at the end of update on >> data node is 2. >> The first select comes to data node with command id 1, which is incorrect. >> The user's intention is to see data after update and its command id should >> be 2. >> Now delete comes with command id 1, and data node increments it once due >> to the delete itself and once due to the update run because of "ON DELETE >> SET NULL", hence the command id at the end of delete is 3. >> Coordinator now sends last select with command id 2, which is again >> incorrect since user's intention is to see data after delete and select >> should have been sent to data node with command id 3 or 4. >> >> Every time data node increments command id due to any statements run >> implicitly either because of the constraints or triggers, this scheme of >> sending command ids to data node from coordinator to solve fetch problems >> would fail. >> >> Datanode can have a trigger e.g. inserting rows thrice on every single >> insert and would increment command id on every insert. Therefore this design >> cannot work. >> >> Either we have to synchronize command ids between datanode and coordinator >> through GTM >> OR >> We will have to send the DECLARE CURSOR down to the datanode. In this case >> however we will not be able to send the cursor query as it is because the >> query might contain a join on two tables which exist on a disjoint set of >> data nodes. >> >> Comments or suggestions are welcome. >> >> >> >> On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt <abb...@en...> >> wrote: >>> >>> Thanks for your comments. >>> >>> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat >>> <ash...@en...> wrote: >>>> >>>> Hi Abbas, >>>> I have few comments to make >>>> 1. With this patch there are two variables for having command Id, that >>>> is going to cause confusion and will be a maintenance burden, might be error >>>> prone. Is it possible to use a single variable instead of two? >>> >>> >>> Are you talking about receivedCommandId and currentCommandId? If yes, I >>> would prefer not having a packet received from coordinator overwrite the >>> currentCommandId at data node, because I am not 100% sure about the life >>> time of currentCommandId, I might overwrite it before time. It would be safe >>> to let currentCommandId as is unless we are compelled to get the next >>> command ID, and have the received command id take priority at that time. >>> >>>> >>>> Right now there is some code which is specific to cursors in your patch. >>>> If you can plug the coordinator command id somehow into currentCommandId, >>>> you won't need that code and any other code which needs coordinator command >>>> ID will be automatically taken care of. >>> >>> >>> That code is required to solve a problem. Consider this case when a >>> coordinator received this transaction >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(2); >>> fetch all from c50; >>> COMMIT; >>> >>> While sending select to the data node in response to a fetch we need to >>> know what was the command ID of the declare cursor statement and we need to >>> send that command ID to the data node for this particular fetch. This is the >>> main idea behind this solution. >>> >>> The first insert goes to the data node with command id 0, the second >>> insert goes with 2. Command ID 1 is consumed by declare cursor. When >>> coordinator sees fetch it needs to send select to the data node with command >>> ID 1 rather than 3. >>> >>> >>>> >>>> 2. A non-transaction on coordinator can spawn tranasactions on datanode >>>> or subtransactions (if there is already a transaction running). Does your >>>> patch handle that case? >>> >>> >>> No and it does not need to, because that case has no known problems that >>> we need to solve. I don't think my patch would impact any such case but I >>> will analyze any failures that I may get in regressions. >>> >>>> >>>> Should we do more thorough research in the transaction management, esp. >>>> to see the impact of getting same command id for two commands on the >>>> datanode? >>> >>> >>> If we issue two commands with the same command ID then we will definitely >>> have visibility issues according to the rules I have already explained. But >>> we will not have two commands sent to the data node with same command id. >>> >>>> >>>> >>>> >>>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt >>>> <abb...@en...> wrote: >>>>> >>>>> Hi Ashutosh, >>>>> Here are the results with the val column, Thanks. >>>>> >>>>> test=# drop table mvcc_demo; >>>>> DROP TABLE >>>>> test=# >>>>> test=# create table mvcc_demo (val int); >>>>> CREATE TABLE >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> test=# >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>>> combo id would be different >>>>> DELETE 0 >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 0 >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 0 >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80689 | 0 | 3 | f >>>>> 80689 | 0 | 4 | f >>>>> 80689 | 0 | 5 | f >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80689 | 0 | 3 | 3 | 1 >>>>> 80689 | 0 | 4 | 4 | 2 >>>>> 80689 | 0 | 5 | 5 | 3 >>>>> >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 3 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80689 | 80689 | 0 | t >>>>> 80689 | 80689 | 1 | t >>>>> 80689 | 80689 | 2 | t >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> ------+------+------+------+----- >>>>> (0 rows) >>>>> >>>>> >>>>> test=# >>>>> test=# END; >>>>> COMMIT >>>>> test=# >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80693 | 0 | 0 | f >>>>> 80693 | 0 | 1 | f >>>>> 80693 | 0 | 2 | f >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80693 | 0 | 0 | 0 | 1 >>>>> 80693 | 0 | 1 | 1 | 2 >>>>> 80693 | 0 | 2 | 2 | 3 >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# UPDATE mvcc_demo SET val = 10; >>>>> >>>>> UPDATE 3 >>>>> test=# >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80693 | 80693 | 0 | t >>>>> 80693 | 80693 | 1 | t >>>>> 80693 | 80693 | 2 | t >>>>> 80693 | 0 | 3 | f >>>>> 80693 | 0 | 3 | f >>>>> 80693 | 0 | 3 | f >>>>> (6 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80693 | 0 | 3 | 3 | 10 >>>>> 80693 | 0 | 3 | 3 | 10 >>>>> 80693 | 0 | 3 | 3 | 10 >>>>> (3 rows) >>>>> >>>>> >>>>> test=# >>>>> test=# END; >>>>> COMMIT >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- From one psql issue >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80699 | 0 | 0 | f >>>>> (1 row) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80699 | 0 | 0 | 0 | 1 >>>>> (1 row) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> test=# -- From another issue >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80699 | 0 | 0 | f >>>>> 80700 | 0 | 0 | f >>>>> 80700 | 0 | 1 | f >>>>> 80700 | 0 | 2 | f >>>>> (4 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80699 | 0 | 0 | 0 | 1 >>>>> 80700 | 0 | 0 | 0 | 2 >>>>> 80700 | 0 | 1 | 1 | 3 >>>>> 80700 | 0 | 2 | 2 | 4 >>>>> (4 rows) >>>>> >>>>> test=# >>>>> test=# UPDATE mvcc_demo SET val = 10; >>>>> >>>>> UPDATE 4 >>>>> test=# >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80700 | 80700 | 0 | t >>>>> 80700 | 80700 | 1 | t >>>>> 80700 | 80700 | 2 | t >>>>> 80699 | 80700 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> (8 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+------+------+------+----- >>>>> 80700 | 0 | 3 | 3 | 10 >>>>> 80700 | 0 | 3 | 3 | 10 >>>>> 80700 | 0 | 3 | 3 | 10 >>>>> 80700 | 0 | 3 | 3 | 10 >>>>> (4 rows) >>>>> >>>>> >>>>> >>>>> >>>>> test=# -- Before finishing this, issue these from the first psql >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80700 | 80700 | 0 | t >>>>> 80700 | 80700 | 1 | t >>>>> 80700 | 80700 | 2 | t >>>>> 80699 | 80700 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> 80700 | 0 | 3 | f >>>>> (8 rows) >>>>> >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>>> xmin | xmax | cmin | cmax | val >>>>> -------+-------+------+------+----- >>>>> 80699 | 80700 | 3 | 3 | 1 >>>>> (1 row) >>>>> >>>>> test=# end; >>>>> COMMIT >>>>> >>>>> >>>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier >>>>> <mic...@gm...> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, so >>>>>> be sure to send it to remote Datanodes really only if necessary. >>>>>> You should put more severe conditions blocking this function cid can >>>>>> easily get incremented in Postgres. >>>>>> >>>>>> Regards, >>>>>> >>>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt >>>>>> <abb...@en...> wrote: >>>>>>> >>>>>>> PFA a WIP patch implementing the design presented earlier. >>>>>>> The patch is WIP because it still has and FIXME and it shows some >>>>>>> regression failures that need to be fixed, but other than that it confirms >>>>>>> that the suggested design would work fine. The following test cases now work >>>>>>> fine >>>>>>> >>>>>>> drop table tt1; >>>>>>> create table tt1(f1 int) distribute by replication; >>>>>>> >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> insert into tt1 values(2); >>>>>>> fetch all from c50; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> BEGIN; >>>>>>> >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> insert into tt1 values(1); >>>>>>> >>>>>>> insert into tt1 values(2); >>>>>>> fetch all from c50; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> insert into tt1 values(2); >>>>>>> >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> insert into tt1 values(3); >>>>>>> >>>>>>> fetch all from c50; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> insert into tt1 values(2); >>>>>>> declare c51 cursor for select * from tt1; >>>>>>> insert into tt1 values(3); >>>>>>> fetch all from c50; >>>>>>> fetch all from c51; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> declare c50 cursor for select * from tt1; >>>>>>> declare c51 cursor for select * from tt1; >>>>>>> insert into tt1 values(2); >>>>>>> insert into tt1 values(3); >>>>>>> fetch all from c50; >>>>>>> fetch all from c51; >>>>>>> COMMIT; >>>>>>> truncate table tt1; >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt >>>>>>> <abb...@en...> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> In a multi-statement transaction each statement is given a command >>>>>>>> identifier >>>>>>>> starting from zero and incrementing for each statement. >>>>>>>> These command indentifers are required for extra tracking because >>>>>>>> each >>>>>>>> statement has its own visibility rules with in the transaction. >>>>>>>> For example, a cursor’s contents must remain unchanged even if later >>>>>>>> statements in the >>>>>>>> same transaction modify rows. Such tracking is implemented using >>>>>>>> system command id >>>>>>>> columns cmin/cmax, which is internally actually is a single column. >>>>>>>> >>>>>>>> cmin/cmax come into play in case of multi-statement transactions >>>>>>>> only, >>>>>>>> they are both zero otherwise. >>>>>>>> >>>>>>>> cmin "The command identifier of the statement within the inserting >>>>>>>> transaction." >>>>>>>> cmax "The command identifier of the statement within the deleting >>>>>>>> transaction." >>>>>>>> >>>>>>>> Here are the visibility rules (taken from comments of tqual.c) >>>>>>>> >>>>>>>> ( // A heap tuple is valid >>>>>>>> "now" iff >>>>>>>> Xmin == my-transaction && // inserted by the current >>>>>>>> transaction >>>>>>>> Cmin < my-command && // before this command, and >>>>>>>> ( >>>>>>>> Xmax is null || // the row has not been >>>>>>>> deleted, or >>>>>>>> ( >>>>>>>> Xmax == my-transaction && // it was deleted by the >>>>>>>> current transaction >>>>>>>> Cmax >= my-command // but not before this >>>>>>>> command, >>>>>>>> ) >>>>>>>> ) >>>>>>>> ) >>>>>>>> || // or >>>>>>>> ( >>>>>>>> Xmin is committed && // the row was inserted by >>>>>>>> a committed transaction, and >>>>>>>> ( >>>>>>>> Xmax is null || // the row has not been >>>>>>>> deleted, or >>>>>>>> ( >>>>>>>> Xmax == my-transaction && // the row is being deleted >>>>>>>> by this transaction >>>>>>>> Cmax >= my-command) || // but it's not deleted >>>>>>>> "yet", or >>>>>>>> ( >>>>>>>> Xmax != my-transaction && // the row was deleted by >>>>>>>> another transaction >>>>>>>> Xmax is not committed // that has not been >>>>>>>> committed >>>>>>>> ) >>>>>>>> ) >>>>>>>> ) >>>>>>>> ) >>>>>>>> >>>>>>>> Because cmin and cmax are internally a single system column, >>>>>>>> it is therefore not possible to simply record the status of a row >>>>>>>> that is created and expired in the same multi-statement transaction. >>>>>>>> For that reason, a special combo command id is created that >>>>>>>> references >>>>>>>> a local memory hash that contains the actual cmin and cmax values. >>>>>>>> It means that if combo id is being used the number we are seeing >>>>>>>> would not be the cmin or cmax it will be an index into a local >>>>>>>> array that contains a structure with has the actual cmin and cmax >>>>>>>> values. >>>>>>>> >>>>>>>> The following queries (taken mostly from >>>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>>>>>>> use the contrib module pageinspect, which allows >>>>>>>> visibility of internal heap page structures and all stored rows, >>>>>>>> including those not visible in the current snapshot. >>>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>>>>>>> >>>>>>>> We are exploring 3 examples here: >>>>>>>> 1) INSERT & DELETE in a single transaction >>>>>>>> 2) INSERT & UPDATE in a single transaction >>>>>>>> 3) INSERT from two different transactions & UPDATE from one >>>>>>>> >>>>>>>> test=# drop table mvcc_demo; >>>>>>>> DROP TABLE >>>>>>>> test=# >>>>>>>> test=# create table mvcc_demo (val int); >>>>>>>> CREATE TABLE >>>>>>>> test=# >>>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>>> TRUNCATE TABLE >>>>>>>> test=# >>>>>>>> test=# BEGIN; >>>>>>>> BEGIN >>>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>>>>>> combo id would be different >>>>>>>> DELETE 0 >>>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>>> DELETE 0 >>>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>>> DELETE 0 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>>> INSERT 0 1 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+------+-----------+------------- >>>>>>>> 80685 | 0 | 3 | f >>>>>>>> 80685 | 0 | 4 | f >>>>>>>> 80685 | 0 | 5 | f >>>>>>>> (3 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>>> DELETE 3 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+-------+-----------+------------- >>>>>>>> 80685 | 80685 | 0 | t >>>>>>>> 80685 | 80685 | 1 | t >>>>>>>> 80685 | 80685 | 2 | t >>>>>>>> (3 rows) >>>>>>>> >>>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax >>>>>>>> they are actually >>>>>>>> the indexes of the internal array already explained above. >>>>>>>> combo id index 0 would contain cmin 3, cmax 6 >>>>>>>> combo id index 1 would contain cmin 4, cmax 6 >>>>>>>> combo id index 2 would contain cmin 5, cmax 6 >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# END; >>>>>>>> COMMIT >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>>> TRUNCATE TABLE >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# BEGIN; >>>>>>>> BEGIN >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>>> INSERT 0 1 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+------+-----------+------------- >>>>>>>> 80675 | 0 | 0 | f >>>>>>>> 80675 | 0 | 1 | f >>>>>>>> 80675 | 0 | 2 | f >>>>>>>> (3 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>>>> UPDATE 3 >>>>>>>> test=# >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+-------+-----------+------------- >>>>>>>> 80675 | 80675 | 0 | t >>>>>>>> 80675 | 80675 | 1 | t >>>>>>>> 80675 | 80675 | 2 | t >>>>>>>> 80675 | 0 | 3 | f >>>>>>>> 80675 | 0 | 3 | f >>>>>>>> 80675 | 0 | 3 | f >>>>>>>> (6 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# END; >>>>>>>> COMMIT >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>>> TRUNCATE TABLE >>>>>>>> test=# >>>>>>>> >>>>>>>> -- From one psql issue >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>>> INSERT 0 1 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+------+-----------+------------- >>>>>>>> 80677 | 0 | 0 | f >>>>>>>> (1 row) >>>>>>>> >>>>>>>> >>>>>>>> test=# -- From another issue >>>>>>>> test=# BEGIN; >>>>>>>> BEGIN >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>>> INSERT 0 1 >>>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>>>>>> INSERT 0 1 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+------+-----------+------------- >>>>>>>> 80677 | 0 | 0 | f >>>>>>>> 80678 | 0 | 0 | f >>>>>>>> 80678 | 0 | 1 | f >>>>>>>> 80678 | 0 | 2 | f >>>>>>>> (4 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>>>> UPDATE 4 >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+-------+-----------+------------- >>>>>>>> 80678 | 80678 | 0 | t >>>>>>>> 80678 | 80678 | 1 | t >>>>>>>> 80678 | 80678 | 2 | t >>>>>>>> 80677 | 80678 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> (8 rows) >>>>>>>> >>>>>>>> test=# >>>>>>>> >>>>>>>> test=# -- Before finishing this, issue these from the first psql >>>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>>> is_combocid >>>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>>> -------+-------+-----------+------------- >>>>>>>> 80678 | 80678 | 0 | t >>>>>>>> 80678 | 80678 | 1 | t >>>>>>>> 80678 | 80678 | 2 | t >>>>>>>> 80677 | 80678 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> 80678 | 0 | 3 | f >>>>>>>> (8 rows) >>>>>>>> >>>>>>>> test=# END; >>>>>>>> COMMIT >>>>>>>> >>>>>>>> >>>>>>>> Now consider the case we are trying to solve >>>>>>>> >>>>>>>> drop table tt1; >>>>>>>> create table tt1(f1 int); >>>>>>>> >>>>>>>> BEGIN; >>>>>>>> insert into tt1 values(1); >>>>>>>> declare c50 cursor for select * from tt1; -- should show one row >>>>>>>> only >>>>>>>> insert into tt1 values(2); >>>>>>>> fetch all from c50; >>>>>>>> COMMIT; >>>>>>>> >>>>>>>> >>>>>>>> Consider Data node 1 log >>>>>>>> >>>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>>> committed READ WRITE] >>>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >>>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>>> committed READ WRITE] >>>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>>> committed READ WRITE] >>>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() >>>>>>>> FROM tt1] >>>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>>>>>>> >>>>>>>> The cursor currently shows both inserted rows because command id at >>>>>>>> data node in >>>>>>>> step (j) is 0 >>>>>>>> step (k) is 1 & >>>>>>>> step (l) is 2 >>>>>>>> >>>>>>>> Where as we need command ids to be >>>>>>>> >>>>>>>> step (j) should be 0 >>>>>>>> step (k) should be 2 & >>>>>>>> step (l) should be 1 >>>>>>>> >>>>>>>> This will solve the cursor visibility problem. >>>>>>>> >>>>>>>> To implement this I suggest we send command IDs to data nodes from >>>>>>>> the coordinator >>>>>>>> like we send gxid. The only difference will be that we do not need >>>>>>>> to take command IDs >>>>>>>> from GTM since they are only valid with in the transaction. >>>>>>>> >>>>>>>> See this example >>>>>>>> >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> ------+------+------+------+---- >>>>>>>> (0 rows) >>>>>>>> >>>>>>>> test=# begin; >>>>>>>> BEGIN >>>>>>>> test=# insert into tt1 values(1); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> (1 row) >>>>>>>> >>>>>>>> test=# insert into tt1 values(2); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> (2 rows) >>>>>>>> >>>>>>>> test=# insert into tt1 values(3); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> (3 rows) >>>>>>>> >>>>>>>> test=# insert into tt1 values(4); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>>> (4 rows) >>>>>>>> >>>>>>>> test=# end; >>>>>>>> COMMIT >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>>> (4 rows) >>>>>>>> >>>>>>>> test=# insert into tt1 values(5); >>>>>>>> INSERT 0 1 >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>>>> (5 rows) >>>>>>>> >>>>>>>> test=# insert into tt1 values(6); >>>>>>>> INSERT 0 1 >>>>>>>> test=# >>>>>>>> test=# >>>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>>> -------+------+------+------+---- >>>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>>>> 80617 | 0 | 0 | 0 | 6 >>>>>>>> (6 rows) >>>>>>>> >>>>>>>> Note that at the end of the multi-statement transaction the command >>>>>>>> id gets reset to zero. >>>>>>>> >>>>>>>> -- >>>>>>>> Abbas >>>>>>>> Architect >>>>>>>> EnterpriseDB Corporation >>>>>>>> The Enterprise PostgreSQL Company >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> Abbas >>>>>>> Architect >>>>>>> EnterpriseDB Corporation >>>>>>> The Enterprise PostgreSQL Company >>>>>>> >>>>>>> Phone: 92-334-5100153 >>>>>>> >>>>>>> Website: www.enterprisedb.com >>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>>> >>>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>>> the individual or entity to whom it is addressed. This message >>>>>>> contains information from EnterpriseDB Corporation that may be >>>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>>> law. If you are not the intended recipient or authorized to receive >>>>>>> this for the intended recipient, any use, dissemination, >>>>>>> distribution, >>>>>>> retention, archiving, or copying of this communication is strictly >>>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>>> the sender immediately by reply e-mail and delete this message. >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Live Security Virtual Conference >>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>> Discussions >>>>>>> will include endpoint security, mobile security and the latest in >>>>>>> malware >>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>> _______________________________________________ >>>>>>> Postgres-xc-developers mailing list >>>>>>> Pos...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Michael Paquier >>>>>> http://michael.otacoo.com >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>>> >>>>> Phone: 92-334-5100153 >>>>> >>>>> Website: www.enterprisedb.com >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>> >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. >>>>> Discussions >>>>> will include endpoint security, mobile security and the latest in >>>>> malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. > > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |